In the multi-commit development model, programmers complete tasks (e.g., implementing a feature) by organizing their work in several commits and packaging them into a commit-set. Analyzing data from developers using this model can be useful to tackle challenging developers' needs, such as knowing which features introduce a bug as well as assessing the risk of integrating certain features in a release. However, to do so one first needs to identify fix-inducing commit-sets. For such an identification, the SZZ algorithm is the most natural candidate, but its performance has not been evaluated in the multi-commit context yet. In this study, we conduct an in-depth investigation on the reliability and performance of SZZ in the multi-commit model. To obtain a reliable ground truth, we consider an already existing SZZ dataset and adapt it to the multi-commit context. Moreover, we devise a second dataset that is more extensive and directly created by developers as well as Quality Assurance (QA) engineers of Mozilla. Based on these datasets, we (1) test the performance of B-SZZ and its non-language-specific SZZ variations in the context of the multi-commit model, (2) investigate the reasons behind their specific behavior, and (3) analyze the impact of non-relevant commits in a commit-set and automatically detect them before using SZZ.
翻译:在多承诺开发模型中,程序员通过将工作组织成若干项承诺和将其包装成承诺集来完成任务(例如,执行一个功能),从而完成多项任务(例如,执行一个功能)。使用该模型分析开发者提供的数据可有助于解决具有挑战性开发者的需求,例如了解哪些特征引入了错误并评估了将某些特征整合到发布中的风险。然而,为了做到这一点,首先需要确定固定引导承诺集。为了进行这种识别,SZZ算法是最自然的候选程序,但其性能尚未在多承诺背景下得到评估。在这项研究中,我们对多承诺模型SZZ的可靠性和性能进行了深入的调查。为了获得可靠的地面真相,我们考虑已经存在的SZZ数据集并将其调整到多承诺背景下的适应性。此外,我们设计了第二套数据集,该数据集由开发者以及Mzilla的质量保证工程师直接创建。基于这些数据集,我们(1) 测试B-SZZZ的性能表现及其非语言性能表现,在SZB-SZS-S-C-S-S-S-C-S-C-S-S-C-C-C-S-C-C-C-C-S-C-C-C-S-C-S-S-S-C-C-C-C-S-C-C-C-C-C-C-S-C-C-C-S-S-C-C-S-S-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C