Developers collaboratively discuss, implement, use, and share software entities hosted on software repositories. Proper documentation plays an important role in successful software management and maintenance. Users exploit Issue Tracking Systems, a facility of software repositories, to keep track of issue reports, to manage the workload and processes, and finally, to document the highlight of their team's effort. An issue report is a rich source of collaboratively curated software knowledge, and can contain a reported problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. GitHub provides labels for tagging issues, as a means of issue management. However, about half of the issues in GitHub's top 1000 repositories do not have any labels. We aim at automating the process of managing issue reports for software teams. We propose a two-stage approach to predict both the objective behind opening an issue and its priority level using feature engineering methods and state-of-the-art text classifiers. To the best of our knowledge, we are the first to fine-tune a Transformer for issue classification. We train and evaluate our models in both project-based and cross-project settings. The latter approach provides a generic prediction model applicable for any unseen software project or projects with little historical data. Our proposed approach can successfully predict the objective and priority level of issue reports with 82% and 75% accuracy, respectively. Moreover, we conducted human labeling and evaluation on unlabeled issues from six unseen GitHub projects to assess the performance of the cross-project model on new data. The model achieves 90% accuracy. We obtain 85% average Percent Agreement and 71% Randolph's free-marginal Kappa translating to substantial agreement among labelers.
翻译:开发者协作讨论、 实施、 使用和共享软件库托管的软件实体 。 正确的文件在成功的软件管理和维护中起着重要作用 。 用户利用软件库的设施“ 问题跟踪系统” 来跟踪问题报告, 管理工作量和程序, 最后, 记录团队工作的亮点 。 问题报告是合作整理软件知识的丰富来源, 可以包含报告的问题, 请求新功能, 或仅包含软件产品问题 。 随着这些问题数量的增加, 手工管理这些问题变得更加困难。 GitHub 提供标签问题标签, 作为一种问题管理工具。 然而, 大约一半的GitHub 头1 000 储存库问题没有标签, 以跟踪问题报告, 以跟踪问题报告, 管理软件团队问题报告的亮点。 我们提出一个两阶段的方法来预测问题背后的目标, 启动一个问题, 请求新的功能, 发行者使用基于现状的标签, 交叉版本的文本分类。 至我们的知识中, 我们首先对问题进行精细的标签标签标签,, 作为一种问题管理手段。 但是,, 将一个变换一个变型的服务器,,, 运行一个历史数据协议