The significant momentum and importance of Mining Software Repositories (MSR) in Software Engineering (SE) has fostered new opportunities and challenges for extensive empirical research. However, MSR researchers seem to struggle to characterize the empirical methods they use into the existing empirical SE body of knowledge. This is especially the case of MSR experiments. To provide evidence on the special characteristics of MSR experiments and their differences with experiments traditionally acknowledged in SE so far, we elicited the hallmarks that differentiate an experiment from other types of empirical studies and characterized the hallmarks and types of experiments in MSR. We analyzed MSR literature obtained from a small-scale systematic mapping study to assess the use of the term experiment in MSR. We found that 19% of the papers claiming to be an experiment are indeed not an experiment at all but also observational studies, so they use the term in a misleading way. From the remaining 81% of the papers, only one of them refers to a genuine controlled experiment while the others stand for experiments with limited control. MSR researchers tend to overlook such limitations, compromising the interpretation of the results of their studies. We provide recommendations and insights to support the improvement of MSR experiments.
翻译:软件工程(SE)中采矿软件储存库的强大势头和重要性为广泛的实证研究带来了新的机会和挑战,然而,MSR研究人员似乎很难将他们使用的经验方法描述为现有的SE实证知识体中的经验方法,这尤其是MSR实验的例子。为了提供证据,说明MSR实验的特殊性及其与SE迄今历来承认的实验的不同之处,我们发现了一些标志,这些标志将实验与其他类型的实证研究区分开来,并说明了MSR的特征和实验类型。我们分析了从一项小规模的系统绘图研究中获得的MSR文献,以评估MSR实验的用法。我们发现,声称是实验的论文中,有19%的确不是实验,而是观察性研究,因此他们使用这一术语的方式有误导性。从其余的81%的论文中,只有一篇论文提到真正的受控制的实验,而另一些论文则是受有限控制的实验。MSR研究人员往往忽视这种局限性,损害对其研究结果的解释。我们为改进MSR实验提供了建议和见解。