Modern version control systems such as Git or SVN include bug tracking mechanisms, through which developers can highlight the presence of bugs through bug reports, i.e., textual descriptions reporting the problem and what are the steps that led to a failure. In past and recent years, the research community deeply investigated methods for easing bug triage, that is, the process of assigning the fixing of a reported bug to the most qualified developer. Nevertheless, only a few studies have reported on how to support developers in the process of understanding the type of a reported bug, which is the first and most time-consuming step to perform before assigning a bug-fix operation. In this paper, we target this problem in two ways: first, we analyze 1,280 bug reports of 119 popular projects belonging to three ecosystems such as Mozilla, Apache, and Eclipse, with the aim of building a taxonomy of the root causes of reported bugs; then, we devise and evaluate an automated classification model able to classify reported bugs according to the defined taxonomy. As a result, we found nine main common root causes of bugs over the considered systems. Moreover, our model achieves high F-Measure and AUC-ROC (64% and 74% on overall, respectively).
翻译:诸如 Git 或 SVN 等现代版本控制系统包括错误跟踪机制, 开发者可以通过这种机制通过错误报告来突出错误的存在, 即文字描述报告问题和导致失败的步骤。 在过去和近年来, 研究界深入调查了缓解错误分类的方法, 即将报告错误的确定工作分配给最合格的开发者的过程。 然而, 只有少数研究报告了如何支持开发者了解报告错误的类型, 这是在指定错误分类操作之前第一个和最耗时的步骤。 在本文中, 我们以两种方式针对这个问题: 首先, 我们分析属于Mozilla、 Apache 和 Eclipse 等三个生态系统的119个流行项目的1 280 个错误报告, 目的是建立对报告错误根源的分类; 然后, 我们设计并评价一个自动化分类模型, 能够按照定义的分类法对报告错误进行分类。 结果, 我们发现在所考虑的系统上发现了9个常见的错误的根源。 此外, 我们的模型实现了高比例为74 % 。