Many software metrics are designed to measure aspects that are believed to be related to software quality. Static software metrics, e.g., size, complexity and coupling are used in defect prediction research as well as software quality models to evaluate software quality. While this indicates a relationship between quality and software metrics, the extent of it is not well understood. Moreover, recent studies found that complexity metrics may be unreliable indicators for understandability of the source code. To explore this relationship, we leverage the intent of developers about what constitutes a quality improvement in their own code base. We manually classify a randomized sample of 2,533 commits from 54 Java open source projects as quality improving depending on the intent of the developer by inspecting the commit message. We distinguish between perfective and corrective maintenance via predefined guidelines and use this data as ground truth for the fine-tuning of a state-of-the art deep learning model for natural language processing. We use the model to increase our data set to 125,482 commits. Based on the resulting data set, we investigate the differences in size and 14 static source code metrics between changes that increase quality, as indicated by the developer, and other changes. We find that quality improving commits are smaller than other commits. Perfective changes have a positive impact on static source code metrics while corrective changes do tend to add complexity. Furthermore, we find that files which are the target of perfective maintenance already have a lower median complexity than other files. Our study results provide empirical evidence for which static source code metrics capture quality improvement from the developers point of view. This has implications for program understanding as well as code smell detection and recommender systems.
翻译:设计许多软件衡量标准是为了衡量被认为与软件质量有关的方面。 静态软件衡量标准,如大小、复杂程度和组合等,用于缺陷预测研究以及软件质量模型,用以评价软件质量。 虽然这表明质量与软件衡量标准之间的关系,但是其程度并不清楚。 此外,最近的研究发现,复杂度指标对于源代码的可理解性指标可能不可靠。 为探索这种关系,我们利用开发者的意图,即他们自己的代码基础的质量改进是什么。我们将54个爪哇开放源项目的2,533个随机抽样样本分类为质量改进,这取决于开发者检查承诺信息的意图。我们通过预设准则区分了质量和软件衡量标准之间的关系,而这些数据的范围则不很深。我们利用该模型将我们的数据集增加到125,482个承诺。我们根据由此产生的数据集,对来自爪哇开放源项目中的2,533个随机抽样样本,即来自54个开放源项目,根据开发者的意图,通过检查承诺信息,将质量评估其质量的改进质量。 我们通过预设准则来区分完美和纠正的维护数据之间的准确性维护标准,我们发现比其他的系统要更精确。 我们发现一个更精确的源,我们发现一个更精确的代码。 我们发现一个更精确的源值是更精确的系统。 我们发现一个更精确的代码, 的代码是更精确的系统。 我们发现一个比其他的源。 我们发现一个更精确的源。 的系统, 改进了比更精确的系统,,我们更精确的代码是更精确的系统。