JIT公司缺陷预测中按时间顺序排列是否重要?部分重复研究 (Does chronology matter in JIT defect prediction? A Partial Replication Study)

Just-In-Time (JIT) models detect the fix-inducing changes (or defect-inducing changes). These models are designed based on the assumption that past code change properties are similar to future ones. However, as the system evolves, the expertise of developers and/or the complexity of the system also changes. In this work, we aim to investigate the effect of code change properties on JIT models over time. We also study the impact of using recent data as well as all available data on the performance of JIT models. Further, we analyze the effect of weighted sampling on the performance of fix-inducing properties of JIT models. For this purpose, we used datasets from Eclipse JDT, Mozilla, Eclipse Platform, and PostgreSQL. We used five families of change-code properties such as size, diffusion, history, experience, and purpose. We used Random Forest to train and test the JIT model and Brier Score and the area under the ROC curve for performance measurement. Our paper suggests that the predictive power of JIT models does not change over time. Furthermore, we observed that the chronology of data in JIT defect prediction models can be discarded by considering all the available data. On the other hand, the importance score of families of code change properties is found to oscillate over time. To mitigate the impact of the evolution of code change properties, it is recommended to use a weighted sampling approach in which more emphasis is placed upon the changes occurring closer to the current time. Moreover, since properties such as "Expertise of the Developer" and "Size" evolve with time, the models obtained from old data may exhibit different characteristics compared to those employing the newer dataset. Hence, practitioners should constantly retrain JIT models to include fresh data.

翻译：仅仅在时间( JIT) 模型可以探测精确诱导变化( 或导变变变) 。这些模型的设计所依据的假设是, 过去代码改变特性与未来特性相似。但是, 随着系统的发展, 开发者的专业知识和/ 系统的复杂性也会发生变化。在这项工作中, 我们的目标是调查代码改变特性对 JIT 模型随时间变化的影响。我们还研究使用最新数据以及所有可用数据对 JIT 模型性能的影响。此外, 我们分析加权取样对 JIT 模型的固定诱变特性性能的影响。为此, 我们使用Eclipse JDT、 Mozilla、 Eclipse 平台和 PostgreSQL 的数据集。但是, 我们使用5个变化代码属性的组合, 如大小、扩散、历史、经验和目的等。我们使用随机森林来培训和测试 JIT 模型和 Brierer 评分, 以及 ROC 曲线下的区域的性能测量。我们的文件显示, JIT 模型的预测力不会随着时间变化而变化。此外, 我们发现, 我们使用更精确的数据显示, “ 变变变变变变的特性数据。变变变。。变变变变变变。