With the rapid growth of software scale and complexity, a large number of bug reports are submitted to the bug tracking system. In order to speed up defect repair, these reports need to be accurately classified so that they can be sent to the appropriate developers. However, the existing classification methods only use the text information of the bug report, which leads to their low performance. To solve the above problems, this paper proposes a new automatic classification method for bug reports. The innovation is that when categorizing bug reports, in addition to using the text information of the report, the intention of the report (i.e. suggestion or explanation) is also considered, thereby improving the performance of the classification. First, we collect bug reports from four ecosystems (Apache, Eclipse, Gentoo, Mozilla) and manually annotate them to construct an experimental data set. Then, we use Natural Language Processing technology to preprocess the data. On this basis, BERT and TF-IDF are used to extract the features of the intention and the multiple text information. Finally, the features are used to train the classifiers. The experimental result on five classifiers (including K-Nearest Neighbor, Naive Bayes, Logistic Regression, Support Vector Machine, and Random Forest) show that our proposed method achieves better performance and its F-Measure achieves from 87.3% to 95.5%.
翻译:随着软件规模和复杂性的快速增长,大量错误报告被提交给错误跟踪系统。为了加快缺陷修复,这些报告需要准确分类,以便将其发送给适当的开发者。然而,现有的分类方法只使用错误报告的文本信息,导致其性能低。为了解决上述问题,本文件为错误报告提出了一个新的自动分类方法。创新是,在对错误报告进行分类时,除了使用报告的文本信息外,还考虑报告的意图(即建议或解释),从而改进分类的性能。首先,我们从四个生态系统(Apache、Eclipse、Gentoo、Mozilla)收集错误报告,并手工说明它们如何构建一个实验数据集。然后,我们用自然语言处理技术来预处理数据。在此基础上,德国应急小组和TF-IDF用于提取意图和多文本信息的特征。最后,还使用这些特征来培训分类人员。五个分类系统(包括K-Nlips、Freairal3、Freaisrial-Regresulation3)的实验结果(包括K-Nerigrestial-Rial-Meal-Rial-Misal3)显示我们提议的V-Risal-Risal-Risal-Risal-Rislations-Nabal。