当AI掌控方向盘：框架约束程序生成的安全性分析 (When AI Takes the Wheel: Security Analysis of Framework-Constrained Program Generation)

In recent years, the AI wave has grown rapidly in software development. Even novice developers can now design and generate complex framework-constrained software systems based on their high-level requirements with the help of Large Language Models (LLMs). However, when LLMs gradually "take the wheel" of software development, developers may only check whether the program works. They often miss security problems hidden in how the generated programs are implemented. In this work, we investigate the security properties of framework-constrained programs generated by state-of-the-art LLMs. We focus specifically on Chrome extensions due to their complex security model involving multiple privilege boundaries and isolated components. To achieve this, we built ChromeSecBench, a dataset with 140 prompts based on known vulnerable extensions. We used these prompts to instruct nine state-of-the-art LLMs to generate complete Chrome extensions, and then analyzed them for vulnerabilities across three dimensions: scenario types, model differences, and vulnerability categories. Our results show that LLMs produced vulnerable programs at alarmingly high rates (18%-50%), particularly in Authentication & Identity and Cookie Management scenarios (up to 83% and 78% respectively). Most vulnerabilities exposed sensitive browser data like cookies, history, or bookmarks to untrusted code. Interestingly, we found that advanced reasoning models performed worse, generating more vulnerabilities than simpler models. These findings highlight a critical gap between LLMs' coding skills and their ability to write secure framework-constrained programs.

翻译：近年来，人工智能浪潮在软件开发领域迅速发展。借助大语言模型（LLMs），即使是新手开发者现在也能基于其高层级需求设计和生成复杂的框架约束软件系统。然而，当LLMs逐渐“掌控”软件开发的方向盘时，开发者可能仅检查程序是否运行，而常常忽略隐藏在生成程序实现方式中的安全问题。在本工作中，我们研究了由最先进LLMs生成的框架约束程序的安全属性。我们特别关注Chrome扩展，因其涉及多个权限边界和隔离组件的复杂安全模型。为此，我们构建了ChromeSecBench数据集，包含基于已知易受攻击扩展的140个提示。我们使用这些提示指导九个最先进的LLMs生成完整的Chrome扩展，然后从三个维度分析其漏洞：场景类型、模型差异和漏洞类别。我们的结果表明，LLMs以惊人的高比例（18%-50%）生成了易受攻击的程序，尤其是在身份验证与身份管理以及Cookie管理场景中（分别高达83%和78%）。大多数漏洞将敏感的浏览器数据（如Cookie、历史记录或书签）暴露给不受信任的代码。有趣的是，我们发现高级推理模型表现更差，生成的漏洞比简单模型更多。这些发现凸显了LLMs的编码技能与其编写安全框架约束程序能力之间的关键差距。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日