A great part of software development involves conceptualizing or communicating the underlying procedures and logic that needs to be expressed in programs. One major difficulty of programming is turning concept into code, especially when dealing with the APIs of unfamiliar libraries. Recently, there has been a proliferation of machine learning methods for code generation and retrieval from natural language queries, but these have primarily been evaluated purely based on retrieval accuracy or overlap of generated code with developer-written code, and the actual effect of these methods on the developer workflow is surprisingly unattested. We perform the first comprehensive investigation of the promise and challenges of using such technology inside the IDE, asking "at the current state of technology does it improve developer productivity or accuracy, how does it affect the developer experience, and what are the remaining gaps and challenges?" We first develop a plugin for the IDE that implements a hybrid of code generation and code retrieval functionality, and orchestrate virtual environments to enable collection of many user events. We ask developers with various backgrounds to complete 14 Python programming tasks ranging from basic file manipulation to machine learning or data visualization, with or without the help of the plugin. While qualitative surveys of developer experience are largely positive, quantitative results with regards to increased productivity, code quality, or program correctness are inconclusive. Analysis identifies several pain points that could improve the effectiveness of future machine learning based code generation/retrieval developer assistants, and demonstrates when developers prefer code generation over code retrieval and vice versa. We release all data and software to pave the road for future empirical studies and development of better models.
翻译:软件开发的一大部分涉及概念化或传达需要在程序中表达的基本程序和逻辑; 编程的一个主要困难是将概念转化为代码, 特别是在处理不熟悉的图书馆的API时。 最近, 代码生成和从自然语言查询中检索代码的机器学习方法激增, 但是这些方法主要是纯粹根据生成代码与开发者编写的代码的检索准确性或重叠性进行评估, 而这些方法对开发者工作流程的实际影响令人惊讶地没有得到检验。 我们对在 IDE 中使用这类技术的前景和挑战进行首次全面调查, 要求“ 在目前技术状态下, 它能够提高开发者的生产率或准确性, 如何影响开发者的经验, 以及尚存的差距和挑战是什么? 我们首先为 IDE 开发了一个插件, 实施代码生成代码和代码的混合, 并设计虚拟环境, 以便收集许多用户事件。 我们要求不同背景的开发者完成14项Pythonon编程任务, 从基本文件操作到机器的检索或数据可视化, 以及或不帮助插件。 质量模型的定性调查对开发者进行质量和数据生成过程进行基本分析, 质量分析时, 质量分析, 和数据生成数据生成程序可以确认。