利用平衡计分卡确定提高《守则》审查效力的机会:工业经验报告 (Using a Balanced Scorecard to Identify Opportunities to Improve Code Review Effectiveness: An Industrial Experience Report)

Peer code review is a widely adopted software engineering practice to ensure code quality and ensure software reliability in both the commercial and open-source software projects. Due to the large effort overhead associated with practicing code reviews, project managers often wonder, if their code reviews are effective and if there are improvement opportunities in that respect. Since project managers at Samsung Research Bangladesh (SRBD) were also intrigued by these questions, this research developed, deployed, and evaluated a production-ready solution using the Balanced SCorecard (BSC) strategy that SRBD managers can use in their day-to-day management to monitor individual developer's, a particular project's or the entire organization's code review effectiveness. Following the four-step framework of the BSC strategy, we-- 1) defined the operation goals of this research, 2) defined a set of metrics to measure the effectiveness of code reviews, 3) developed an automated mechanism to measure those metrics, and 4) developed and evaluated a monitoring application to inform the key stakeholders. Our automated model to identify useful code reviews achieves 7.88% and 14.39% improvement in terms of accuracy and minority class F1 score respectively over the models proposed in prior studies. It also outperforms human evaluators from SRBD, that the model replaces, by a margin of 25.32% and 23.84% respectively in terms of accuracy and minority class F1 score. In our post-deployment survey, SRBD developers and managers indicated that they found our solution as useful and it provided them with important insights to help their decision makings.

翻译：同行守则审查是一种广泛采用的软件工程做法,目的是确保守则质量,并确保商业和开放源码软件项目软件的可靠性。由于与守则审查相关的大量工作管理费,项目管理员往往怀疑其代码审查是否有效,在这方面是否有改进机会。由于这些问题也引起三星研究孟加拉国(SRBD)项目管理员的好奇,这项研究开发、部署和评价了一种可用于生产的解决办法,使用平衡SCorecard(BSC)战略,供SRBD管理人员在日常管理中使用,以监测个体开发商、某个特定项目或整个组织的代码审查效力。根据BSC战略的四步框架,项目管理员往往怀疑其代码审查是否有效,以及在这方面是否有改进的机会。由于三星研究孟加拉国(SRBD)项目管理员也对这些问题感兴趣,开发、部署和评价了一种可用于监测的解决方案,用于向关键利益攸关方提供信息。我们用于确定有用的代码审查的自动化模式在准确性和少数群体类别F1的评分方面实现了7.88%和14.39的改进率。我们先前研究中分别用S84%的评分比标准取代了SBRB的排名,并用S84%的评比标准分别取代了我们23的评分的S84%的评分。