Generative machine learning models have recently been applied to source code, for use cases including translating code between programming languages, creating documentation from code, and auto-completing methods. Yet, state-of-the-art models often produce code that is erroneous or incomplete. In a controlled study with 32 software engineers, we examined whether such imperfect outputs are helpful in the context of Java-to-Python code translation. When aided by the outputs of a code translation model, participants produced code with fewer errors than when working alone. We also examined how the quality and quantity of AI translations affected the work process and quality of outcomes, and observed that providing multiple translations had a larger impact on the translation process than varying the quality of provided translations. Our results tell a complex, nuanced story about the benefits of generative code models and the challenges software engineers face when working with their outputs. Our work motivates the need for intelligent user interfaces that help software engineers effectively work with generative code models in order to understand and evaluate their outputs and achieve superior outcomes to working alone.
翻译:最近对源代码应用了生成机学习模型,用于翻译编程语言之间的代码、创建代码文件和自动完成方法等使用案例。然而,最先进的模型往往产生错误或不完整的代码。在对32名软件工程师进行的受控研究中,我们研究了这种不完善的输出是否对爪哇到平东代码翻译有帮助。当在代码翻译模型产出的帮助下,参与者产生的代码错误比单独工作时少。我们还研究了AI翻译的质量和数量如何影响工作过程和结果质量,并发现提供多种翻译对翻译过程的影响大于对所提供翻译质量的影响。我们的结果讲述了一个复杂而细微的故事,讲述了基因化代码模型的好处以及软件工程师在使用其输出时面临的挑战。我们的工作激发了对智能用户界面的需求,这些界面帮助软件工程师有效地使用基因化代码模型,以便理解和评价其产出并实现优异的结果,从而单独工作。