Commit messages have an important impact in software development, especially when working in large teams. Multiple developers who have a different style of writing may often be involved in the same project. For this reason, it may be difficult to maintain a strict pattern of writing informative commit messages, with the most frequent issue being that these messages are not descriptive enough. In this paper we apply neural machine translation (NMT) techniques to convert code diffs into commit messages and we present an improved sketch-based encoder for this task. We split the approach into three parts. Firstly, we focus on finding a more suitable NMT baseline for this problem. Secondly, we show that the performance of the NMT models can be improved by training on examples containing a specific file type. Lastly, we introduce a novel sketch-based neural model inspired by recent approaches used for code generation and we show that the sketch-based encoder significantly outperforms existing state of the art solutions. The results highlight that this improvement is relevant especially for Java source code files, by examining two different datasets introduced in recent years for this task.
翻译:提交信息对软件开发有重要影响, 特别是在大型团队工作时。 多位写作风格不同的开发者往往会参与同一个项目。 因此, 很难保持严格的写信息承诺信息模式, 最常见的问题是这些信息不够描述性。 在本文中, 我们应用神经机器翻译( NMT) 技术将代码 diffs (NMT) 转换成承诺信息, 我们为此任务提出了一个改进的素描编码器。 我们把方法分成三部分。 首先, 我们侧重于为这一问题找到一个更合适的 NMT 基准。 第二, 我们通过对包含特定文件类型的实例进行培训, 显示NMT 模型的性能可以得到改善。 最后, 我们引入了一种新颖的基于素描的神经模型, 受最近用于代码生成的方法的启发, 我们展示了基于素描的编码器大大超越了艺术解决方案的现有状态。 结果突出表明, 这一改进对于爪哇源代码文件特别相关, 其方法是通过研究近年来为这项任务引入的两种不同的数据集。