Code Review (CR) is the cornerstone for software quality assurance and a crucial practice for software development. As CR research matures, it can be difficult to keep track of the best practices and state-of-the-art in methodology, dataset, and metric. This paper investigates the potential of benchmarking by collecting methodology, dataset, and metric of CR studies. A systematic mapping study was conducted. A total of 112 studies from 19,847 papers published in high-impact venues between the years 2011 and 2019 were selected and analyzed. First, we find that empirical evaluation is the most common methodology (65% of papers), with solution and experience being the least common methodology. Second, we highlight 50% of papers that use the quantitative method or mixed-method have the potential for replicability. Third, we identify 457 metrics that are grouped into sixteen core metric sets, applied to nine Software Engineering topics, showing different research topics tend to use specific metric sets. We conclude that at this stage, we cannot benchmark CR studies. Nevertheless, a common benchmark will facilitate new researchers, including experts from other fields, to innovate new techniques and build on top of already established methodologies. A full replication is available at https://naist-se.github.io/code-review/.
翻译:守则审查(CR)是软件质量保证的基石,也是软件开发的关键做法。CR研究随着方法、数据集和计量方法的成熟,很难跟踪最佳做法和最新技术。本文件通过收集方法、数据集和CR研究的衡量调查基准的潜力。进行了系统的绘图研究。从2011年至2019年在高影响地点发表的19 847份文件中,共挑选并分析了112份研究报告。首先,我们发现经验评估是最常见的方法(65%的文件),解决办法和经验是最不常见的方法。第二,我们强调50%使用定量方法或混合方法的论文有可能复制。第三,我们确定了457项指标,分为16个核心指标集,适用于9个软件工程专题,显示不同的研究专题往往使用特定指标集。我们的结论是,在现阶段,我们无法对CR研究进行基准。然而,共同基准将便利新的研究人员,包括来自其他领域的专家,创新新技术和在既定方法的顶层上建立。可全面复制。