代代概率还不够:探索AI授权的《守则》完成过程中不确定性强调的实效 (Generation Probabilities Are Not Enough: Exploring the Effectiveness of Uncertainty Highlighting in AI-Powered Code Completions)

Large-scale generative models enabled the development of AI-powered code completion tools to assist programmers in writing code. However, much like other AI-powered tools, AI-powered code completions are not always accurate, potentially introducing bugs or even security vulnerabilities into code if not properly detected and corrected by a human programmer. One technique that has been proposed and implemented to help programmers identify potential errors is to highlight uncertain tokens. However, there have been no empirical studies exploring the effectiveness of this technique-- nor investigating the different and not-yet-agreed-upon notions of uncertainty in the context of generative models. We explore the question of whether conveying information about uncertainty enables programmers to more quickly and accurately produce code when collaborating with an AI-powered code completion tool, and if so, what measure of uncertainty best fits programmers' needs. Through a mixed-methods study with 30 programmers, we compare three conditions: providing the AI system's code completion alone, highlighting tokens with the lowest likelihood of being generated by the underlying generative model, and highlighting tokens with the highest predicted likelihood of being edited by a programmer. We find that highlighting tokens with the highest predicted likelihood of being edited leads to faster task completion and more targeted edits, and is subjectively preferred by study participants. In contrast, highlighting tokens according to their probability of being generated does not provide any benefit over the baseline with no highlighting. We further explore the design space of how to convey uncertainty in AI-powered code completion tools, and find that programmers prefer highlights that are granular, informative, interpretable, and not overwhelming.

翻译：大型基因模型有助于开发AI动力代码完成工具,以协助程序员编写代码。然而,与AI动力工具一样,AI动力代码完成过程并不总是准确的,如果未经人类程序员适当检测和纠正,则有可能将错误甚至安全弱点引入代码中。为帮助程序员发现潜在错误而提议和实施的一种技术是突出不确定的符号。然而,没有经验研究探索这一技术的有效性,或调查在基因模型中不同和尚未同意的不确定性概念。我们探讨的问题是,在与AI动力代码完成工具合作时,关于不确定性的信息是否使程序员能够更快和准确地生成代码,如果是的话,则在代码完成工具中,我们发现,通过混合方法研究,我们比较了三种条件:仅提供AI系统代码的完成过程,突出基本直线模型生成的最起码的信号,强调由程序员进一步编辑的预测最高可能性。我们发现,在程序完成过程中,通过更精确的概率,我们发现,最精确的概率与最接近的版本相比,我们更准确地估计的是,通过精确的深度研究,我们发现,能够更准确地评估其完成过程的概率,而更准确地显示,完成过程的概率是更精确地显示,完成过程的概率是更精确地显示。