班级规模是否重要?深入评估班级规模对软件缺陷预测的影响 (Does class size matter? An in-depth assessment of the effect of class size in software defect prediction)

In the past 20 years, defect prediction studies have generally acknowledged the effect of class size on software prediction performance. To quantify the relationship between object-oriented (OO) metrics and defects, modelling has to take into account the direct, and potentially indirect, effects of class size on defects. However, some studies have shown that size cannot be simply controlled or ignored, when building prediction models. As such, there remains a question whether, and when, to control for class size. This study provides a new in-depth examination of the impact of class size on the relationship between OO metrics and software defects or defect-proneness. We assess the impact of class size on the number of defects and defect-proneness in software systems by employing a regression-based mediation (with bootstrapping) and moderation analysis to investigate the direct and indirect effect of class size in count and binary defect prediction. Our results show that the size effect is not always significant for all metrics. Of the seven OO metrics we investigated, size consistently has significant mediation impact only on the relationship between Coupling Between Objects (CBO) and defects/defect-proneness, and a potential moderation impact on the relationship between Fan-out and defects/defect-proneness. Based on our results we make three recommendations. One, we encourage researchers and practitioners to examine the impact of class size for the specific data they have in hand and through the use of the proposed statistical mediation/moderation procedures. Two, we encourage empirical studies to investigate the indirect effect of possible additional variables in their models when relevant. Three, the statistical procedures adopted in this study could be used in other empirical software engineering research to investigate the influence of potential mediators/moderators.

翻译：在过去20年中,缺陷预测研究普遍承认阶级规模对软件预测性能的影响。为了量化面向目标(OO)的指标和缺陷之间的关系,建模必须考虑到阶级规模对缺陷的直接和潜在间接影响;然而,一些研究显示,在建立预测模型时,规模不能简单地控制或忽视。因此,对于是否以及何时控制阶级规模,仍然存在着一个问题。在所调查的7项OO指标中,规模始终对类规模对OO指标与软件缺陷或易变性之间的关系产生重要的调解影响。我们评估阶级规模对软件系统缺陷和易变性的影响,通过采用基于回归的调解(加示征)和温度分析来评估阶级规模对缺陷和易变性的影响。我们研究中采用的统计变异性(我们研究中采用的)和变异性(我们使用的)统计变异性(我们研究中采用这种变异性(我们使用的)的统计变异性(我们使用的)的统计变异性(我们使用的) 研究中,我们研究中所使用的统计变异性(我们使用的) 研究中所使用的统计变异性(我们使用的) ) 的统计变异性(我们使用的) 研究中的演变性(我们使用的) ) 的统计变异性(我们使用的) 研究中所使用的三种变异性(我们使用的) 研究中性(我们使用的) 的统计变异性(我们使用的) ) ) 的统计变性(我们使用的) 的统计变异性(我们使用的3性(我们使用的) ) 研究中性(我们使用的) ) ) 的统计性(我们使用的) 研究中性(我们使用的) 的统计性(我们使用的) 分析性(我们使用的) 的统计性(我们使用的) (我们使用的) (我们使用的) (我们使用的) ) ) (我们使用的) 的) ) 的) 的) (我们所使用的(我们使用的) (我们所使用的(我们使用的) ) 的(我们使用的) (我们使用的) (我们使用的) (我们使用的) (我们使用的) (我们使用的) ) 的(我们使用的) ) ) ) ) ) 的) 的) 的(我们使用的) 的(我们使用的) 的(我们使用的) 的(我们使用的) 的) 的)