With the advent of powerful neural language models, AI-based systems to assist developers in coding tasks are becoming widely available; Copilot is one such system. Copilot uses Codex, a large language model (LLM), to complete code conditioned on a preceding "prompt". Codex, however, is trained on public GitHub repositories, viz., on code that may include bugs and vulnerabilities. Previous studies [1], [2] show Codex reproduces vulnerabilities seen in training. In this study, we examine how prone Codex is to generate an interesting bug category, single statement bugs, commonly referred to as simple, stupid bugs or SStuBs in the MSR community. We find that Codex and similar LLMs do help avoid some SStuBs, but do produce known, verbatim SStuBs as much as 2x as likely than known, verbatim correct code. We explore the consequences of the Codex generated SStuBs and propose avoidance strategies that suggest the possibility of reducing the production of known, verbatim SStubs, and increase the possibility of producing known, verbatim fixes.
翻译:随着强大的神经语言模型的出现,以AI为基础的协助开发人员进行编码任务的系统正变得越来越普遍;Copilot就是其中之一。 Copilot使用Codex,一个大语言模型(LLM)来完成基于前导“提示”的代码。 然而,Codex是在公共GitHub存储库上进行训练的,即可能包含错误和漏洞的代码。以前的研究[1],[2]显示Codex会复制训练中看到的漏洞。在本研究中,我们研究了Codex在生成一个有趣的缺陷类别(称为单语句错误或SStuBs)时的弱点。我们发现,Codex和类似的LLMs确实有助于避免一些SStuBs,但确实产生已知的文本SStuBs的可能性高达已知的文本正确代码的2倍。我们探讨了Codex生成的SStuBs的后果,并提出避免策略,建议可能减少已知的文本SStuBs的生产,并增加已知的文本修复的可能性。