Many software metrics are designed to measure aspects that are believed to be related to software quality. Static software metrics, e.g., size, complexity and coupling are used in defect prediction research as well as software quality models to evaluate software quality. Static analysis tools also include boundary values for complexity and size that generate warnings for developers. While this indicates a relationship between quality and software metrics, the extend of it is not well understood. Moreover, recent studies found that complexity metrics may be unreliable indicators for understandability of the source code. To explore this relationship, we leverage the intent of developers about what constitutes a quality improvement in their own code base. We manually classify a randomized sample of 2,533 commits from 54 Java open source projects as quality improving depending on the intent of the developer by inspecting the commit message. We distinguish between perfective and corrective maintenance via predefined guidelines and use this data as ground truth for the fine-tuning of a state-of-the art deep learning model for natural language processing. We use the model to increase our data set to 125,482 commits. Based on the resulting data set, we investigate the differences in size and 14 static source code metrics between changes that increase quality and other changes. In addition, we investigate which files are targets of quality improvements. We find that quality improving commits are smaller than other commits. Perfective changes have a positive impact on static source code metrics while corrective changes add complexity. Files which are the target of perfective maintenance already have a lower median complexity than other files. Our study results provide empirical evidence for which static source code metrics capture quality improvement from the developers point of view.
翻译:设计许多软件衡量标准是为了衡量被认为与软件质量有关的方面。在缺陷预测研究和软件质量模型中,使用了静态软件衡量标准,例如大小、复杂程度和组合等,用以评估软件质量。静态分析工具还包括复杂程度和大小的边界值,为开发者提供警告。这显示了质量和软件衡量标准之间的关系,但其延伸范围不为人所熟知。此外,最近的研究发现,复杂度指标可能不可靠,无法理解源码的可理解性指标。为了探索这种关系,我们已经利用开发者的意图,说明什么是其自身代码基础的质量改进。我们将54个爪哇开放源项目的2 533个随机化样本分类为质量改进,这取决于开发者通过检查承诺信息的意图。我们通过预先确定的指导方针区分了完美和纠正性维护标准之间的关系,并将这一数据用作精确度的精确度模型的精度。我们使用该模型将我们的数据集增加到125,482个数据源。基于由此产生的数据集,我们对54个爪哇开放源的维护数据样本项目进行了随机抽样抽样抽样样本样本样本样本样本样本,根据开发者的意图,通过检查开发者的意图改进了大小和14个静态源码代码的变化。我们调查了其它标准代码的质量变化。我们发现了其它质量指标的精度。我们找到了质量的精度。我们从其它的精度的精度,在改进了其它的精度数据中增加了一个比。我们改进了其它的精度的精度的精度的精度的精度。