Software repositories such as GitHub host a large number of software entities. Developers collaboratively discuss, implement, use, and share these entities. Proper documentation plays an important role in successful software management and maintenance. Users exploit Issue Tracking Systems, a facility of software repositories, to keep track of issue reports, to manage the workload and processes, and finally, to document the highlight of their team's effort. An issue report is a rich source of collaboratively curated software knowledge, and can contain a reported problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. GitHub provides labels for tagging issues, as a means of issue management. However, about half of the issues in GitHub's top 1000 repositories do not have any labels. In this work, we aim at automating the process of managing issue reports for software teams. We propose a two-stage approach to predict both the objective behind opening an issue and its priority level using feature engineering methods and state-of-the-art text classifiers. We train and evaluate our models in both project-based and cross-project settings. The latter approach provides a generic prediction model applicable for any unseen software project or projects with little historical data. Our proposed approach can successfully predict the objective and priority level of issue reports with 82% and 75% accuracy, respectively. Moreover, we conducted a user study on unlabeled issues from six unseen GitHub projects to assess the performance of the cross-project model on new data. The model achieves 90% accuracy on the sample set.
翻译:GitHub 等软件库存放大量软件实体。 开发者协作讨论、 实施、 使用和分享这些实体。 适当的文件在成功的软件管理和维护中发挥重要作用。 用户利用软件库设施“ 问题跟踪系统”, 跟踪问题报告, 管理工作量和流程, 最后记录团队工作的重点。 问题报告是合作整理软件知识的丰富来源, 可能包含报告的问题、 请求新功能, 或仅包含关于软件产品的问题。 随着这些问题数量的增加, 手工管理这些问题变得更加困难。 GitHub 为标签问题提供了标签, 作为一种问题管理手段。 然而, 大约一半的GitHub 软件库问题都没有任何标签。 在这项工作中, 我们的目标是将管理软件团队问题报告的自动化进程自动化。 我们提出一个两阶段模式, 预测在打开问题的背后的目标, 以及其优先级别上, 使用特别工程方法和最先进的文本分类。 我们用一个小的通用的模型, 来测试我们的Girob 和软件项目中, 任何基于历史优先级的预估项目中, 都提供了一个小的预估测点。