Human-in-the-Loop Secure Code Synthesis: Integrating Security Heuristics in AI Code Generation
DOI:
https://doi.org/10.63282/3050-922X.IJERET-V6I4P101Keywords:
Secure Code Generation, Large Language Mod- els, Human-in-the-Loop, Static Analysis, Federated Learning, Explainable AI, Application SecurityAbstract
The proliferation of Large Language Models (LLMs) as code generation assistants has revolutionized software development but simultaneously introduced a new vector for security vulnerabilities. These models, trained on vast repositories of public code, often replicate insecure patterns, leading to the generation of code susceptible to common exploits. To address this challenge, we propose a novel Human-in-the-Loop (HITL) framework for secure code synthesis that synergistically combines real-time static analysis with LLM-based code generation directly within the Integrated Development Environment (IDE). Our system, named SecurifyAI, employs an iterative refinement loop where code snippets generated by an LLM are immediately scru- tinized by a lightweight, high-speed Static Application Security Testing (SAST) engine. The identified potential vulnerabilities are then translated into contextual feedback to guide the LLM in regenerating a more secure version of the code. This proactive approach shifts security from a post-development afterthought to an integral part of the code creation process. Furthermore, we in- troduce a trust metric-based federated learning (FL) framework to continuously improve the underlying LLM’s security aware- ness across distributed, privacy-sensitive environments. This FL approach ensures integrity and accountability by weighting con- tributions from different clients based on a calculated trust score. Finally, we propose a formal model to quantify and optimize the inherent trade-off between the explainability of security warnings and the performance of the code generation system. Our experimental results, conducted on a curated dataset of insecure code patterns, demonstrate that SecurifyAI reduces the incidence of common vulnerabilities (CWEs) by over 80% compared to baseline LLM assistants, while maintaining acceptable latency and improving developer code acceptance rates through clear, actionable feedback
References
[1] M. Chen et al.,”Evaluating Large Language Models Trained on Code,” arXiv preprint arXiv:2107.03374, 2021.
[2] H. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, ”Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions,” in IEEE Symposium on Security and Privacy (S&P), 2022.
[3] R. Sobania, D. Briesch, M. and C. J. Karl, ”An Analysis of the Security and Code Quality of AI-Coders,” in 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), 2023.
[4] Karpathy, J. Johnson, and L. Fei-Fei, ”Visualizing and Understanding Recurrent Networks,” arXiv preprint arXiv:1506.02078, 2015.
[5] Vaswani et al., ”Attention Is All You Need,” in Advances in Neural Information Processing Systems (NeurIPS), 2017.
[6] D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. L. Dyer, D. Song, and J. Steinhardt, ”Measuring Coding Challenge Competence With APPS,” arXiv preprint arXiv:2105.09938, 2021.
[7] M. I. Siddiq and T. F. Bissyande, ”Security Smells in AI-Generated Code: An Empirical Study of GitHub Copilot,” in 2023 IEEE Inter- national Conference on Software Maintenance and Evolution (ICSME), 2023.
[8] R. C. Monarch, Human-in-the-Loop Machine Learning: Active Learning and Annotation for Human-Centered AI. Manning Publications, 2021.
[9] M. D. Ernst, ”The future of software engineering,” in Proceedings of the 39th International Conference on Software Engineering: Future of Software Engineering Track, 2017, pp. 103-118.
[10] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, ”Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Artificial Intelligence and Statistics (AISTATS), 2017.
[11] N. D. Nguyen et al.,”Federated Learning for Intrusion Detection: A Survey,” ACM Computing Surveys, vol. 54, no. 10s, pp. 1-36, 2022.
[12] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, ”How to Backdoor Federated Learning,” in Artificial Intelligence and Statistics (AISTATS), 2020.
[13] J. Brooke, ”SUS: A ’quick and dirty’ usability scale,” in Usability evaluation in industry, CRC Press, 1996, pp. 189-194.
[14] S. Ross, M. C. Hughes, and F. Doshi-Velez, ”Right for the right rea- sons: Training differentiable models by constraining their explanations,” in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017, pp. 2662-2670.
[15] OWASP, OWASP Top Ten,” Open Web Application Security Project, 2021. [Online]. Available: https://owasp.org/www-project-top-ten/
[16] MITRE 2023, CWE Top 25 Most Dangerous Software Weaknesses,” MITRE Corporation, 2023. [Online]. Available: https://cwe.mitre.org/top25/archive/2023/2023cwetop25.html