Software engineering research has always being concerned with the improvement of code completion approaches, which suggest the next tokens a developer will likely type while coding. The release of GitHub Copilot constitutes a big step forward, also because of its unprecedented ability to automatically generate even entire functions from their natural language description. While the usefulness of Copilot is evident, it is still unclear to what extent it is robust. Specifically, we do not know the extent to which semantic-preserving changes in the natural language description provided to the model have an effect on the generated code function. In this paper we present an empirical study in which we aim at understanding whether different but semantically equivalent natural language descriptions result in the same recommended function. A negative answer would pose questions on the robustness of deep learning (DL)-based code generators since it would imply that developers using different wordings to describe the same code would obtain different recommendations. We asked Copilot to automatically generate 892 Java methods starting from their original Javadoc description. Then, we generated different semantically equivalent descriptions for each method both manually and automatically, and we analyzed the extent to which predictions generated by Copilot changed. Our results show that modifying the description results in different code recommendations in ~46% of cases. Also, differences in the semantically equivalent descriptions might impact the correctness of the generated code ~28%.
翻译:软件工程研究一直关注代码完成方法的改进, 这表明下一个开发者在编码时可能会在下一个符号上出现。 释放 GitHub Copilot 是一个很大的进步, 这也是因为它具有前所未有的能力, 能够自动从自然语言描述中生成甚至全部的功能。 虽然 Copil 的有用性是显而易见的, 但仍然不清楚它在什么程度上是健全的。 具体地说, 我们不知道向模型提供的自然语言描述的语义保留变化对生成的代码功能产生了多大影响。 在本文中, 我们提出了一个实验性研究, 目的是了解不同但语义等同的自然语言描述是否产生同样的建议功能。 否定性回答会给深层次学习( DL) 代码生成者的强性带来问题, 因为它意味着开发者使用不同的措辞描述同一代码会获得不同的建议。 我们要求 Copilot 自动生成892 Java 方法, 从最初的 Javadoc 描述开始。 然后, 我们为每种方法手工和自动生成了不同的语义等同的描述, 我们分析了在 Copilot 描述中得出正确性描述结果的程度。 我们还分析了对等同的描述的结果。