Commit messages are the atomic level of software documentation. They provide a natural language description of the code change and its purpose. Messages are critical for software maintenance and program comprehension. Unlike documenting feature updates and bug fixes, little is known about how developers document their refactoring activities. Developers can perform multiple refactoring operations, including moving methods, extracting classes, for various reasons. Yet, there is no systematic study that analyzes the extent to which the documentation of refactoring accurately describes the refactoring operations performed at the source code level. Therefore, this paper challenges the ability of refactoring documentation to adequately predict the refactoring types, performed at the commit level. Our analysis relies on the text mining of commit messages to extract the corresponding features that better represent each class. The extraction of text patterns, specific to each refactoring allows the design of a model that verifies the consistency of these patterns with their corresponding refactoring. Such verification process can be achieved via automatically predicting the method-level type of refactoring being applied, namely Extract Method, Inline Method, Move Method, Pull-up Method, Push-down Method, and Rename Method. We compared various classifiers, and a baseline keyword-based approach, in terms of their prediction performance, using a dataset of 5,004 commits. Our main findings show that the complexity of refactoring type prediction varies from one type to another. Rename method and Extract method were found to be the best documented refactoring activities, while Pull-up Method and Push-down Method were the hardest to be identified via textual descriptions. Such findings bring the attention of developers to the necessity of paying more attention to the documentation of these types.
翻译:提交信息是软件文档的原子级。 它们提供了对代码变化及其目的的自然语言描述。 信件对于软件维护和程序理解至关重要 。 与记录功能更新和错误修正不同, 我们对于开发者如何记录其重构活动知之甚少。 开发者可以出于各种原因, 执行多种再设定操作, 包括移动方法、 提取分类等。 然而, 没有系统的研究可以分析再设定文件准确描述源代码级别所执行的重构计算操作的精确度。 因此, 本文挑战了重新设置文件的能力, 以充分预测在承诺级别所执行的重构类型。 我们的分析依赖于承诺信息的文本挖掘, 以提取更好地代表每类的相应特性。 提取每种再设定的文本模式可以设计一个模型, 以核实这些模式与相应的再设定。 这种核查过程可以通过自动预测基于方法的重构解释性描述类型来实现, 即抽调方法、 内置方法、 调方法、 调方法、 打印方法、 重新排序方法 和 方法 显示另一种方法 。 我们的再排序方法, 将使用一种最精确的方法 进行排序 。 进行 。