SoK：理解AI4Code应用场景中的（新型）安全问题 (SoK: Understanding (New) Security Issues Across AI4Code Use Cases)

AI-for-Code (AI4Code) systems are reshaping software engineering, with tools like GitHub Copilot accelerating code generation, translation, and vulnerability detection. Alongside these advances, however, security risks remain pervasive: insecure outputs, biased benchmarks, and susceptibility to adversarial manipulation undermine their reliability. This SoK surveys the landscape of AI4Code security across three core applications, identifying recurring gaps: benchmark dominance by Python and toy problems, lack of standardized security datasets, data leakage in evaluation, and fragile adversarial robustness. A comparative study of six state-of-the-art models illustrates these challenges: insecure patterns persist in code generation, vulnerability detection is brittle to semantic-preserving attacks, fine-tuning often misaligns security objectives, and code translation yields uneven security benefits. From this analysis, we distill three forward paths: embedding secure-by-default practices in code generation, building robust and comprehensive detection benchmarks, and leveraging translation as a route to security-enhanced languages. We call for a shift toward security-first AI4Code, where vulnerability mitigation and robustness are embedded throughout the development life cycle.

翻译：AI-for-Code（AI4Code）系统正在重塑软件工程实践，诸如GitHub Copilot等工具正在加速代码生成、代码转换与漏洞检测的进程。然而，伴随着这些技术进步，安全风险依然广泛存在：不安全的输出、存在偏见的基准测试以及对对抗性操纵的易感性，均削弱了此类系统的可靠性。本文通过系统化知识梳理（SoK），调研了AI4Code在三大核心应用领域的安全现状，识别出共性问题：基准测试过度集中于Python语言与玩具问题、缺乏标准化的安全数据集、评估过程中的数据泄露，以及脆弱的对抗鲁棒性。通过对六个前沿模型的比较研究，具体揭示了这些挑战：代码生成中持续存在不安全模式、漏洞检测在语义保持攻击下表现脆弱、微调过程常与安全目标失准、代码转换带来的安全效益参差不齐。基于此分析，我们提炼出三条发展路径：在代码生成中嵌入默认安全实践、构建鲁棒且全面的检测基准，以及利用代码转换作为实现安全增强型语言的途径。我们呼吁向安全优先的AI4Code范式转变，将漏洞缓解与鲁棒性深度嵌入整个开发生命周期。