Commit messages play an important role in several software engineering tasks such as program comprehension and understanding program evolution. However, programmers neglect to write good commit messages. Hence, several Commit Message Generation (CMG) tools have been proposed. We observe that the recent state of the art CMG tools use simple and easy to compute automated evaluation metrics such as BLEU4 or its variants. The advances in the field of Machine Translation (MT) indicate several weaknesses of BLEU4 and its variants. They also propose several other metrics for evaluating Natural Language Generation (NLG) tools. In this work, we discuss the suitability of various MT metrics for the CMG task. Based on the insights from our experiments, we propose a new variant specifically for evaluating the CMG task. We re-evaluate the state of the art CMG tools on our new metric. We believe that our work fixes an important gap that exists in the understanding of evaluation metrics for CMG research.
翻译:在诸如程序理解和理解程序演变等若干软件工程任务中传递信息具有重要作用。然而,程序员忽略了撰写良好的承诺信息。因此,已经提出了若干委员会信息生成(CMG)工具。我们注意到,最新先进的CMG工具使用简单易懂的自动评估指标,如BLEU4或其变体。机器翻译(MT)领域的进展表明BLEU4及其变体的一些弱点。他们还提出了其他几项评估自然语言生成(NLG)工具的衡量标准。在这项工作中,我们讨论了各种MT指标对CMG任务的适宜性。根据我们实验的深入了解,我们提出了专门评估CMG任务的新变体。我们重新评价CMG工具关于我们新指标的状况。我们认为,我们的工作弥补了在理解CMG研究的评价指标方面存在的重要差距。