预测软件储存库问题报告的目标和重点 (Predicting the Objective and Priority of Issue Reports in Software Repositories)

Software repositories such as GitHub host a large number of software entities. Developers collaboratively discuss, implement, use, and share these entities. Proper documentation plays an important role in successful software management and maintenance. Users exploit Issue Tracking Systems, a facility of software repositories, to keep track of issue reports, to manage the workload and processes, and finally, to document the highlight of their team's effort. An issue report is a rich source of collaboratively curated software knowledge, and can contain a reported problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. GitHub provides labels for tagging issues, as a means of issue management. However, about half of the issues in GitHub's top 1000 repositories do not have any labels. In this work, we aim at automating the process of managing issue reports for software teams. We propose a two-stage approach to predict both the objective behind opening an issue and its priority level using feature engineering methods and state-of-the-art text classifiers. We train and evaluate our models in both project-based and cross-project settings. The latter approach provides a generic prediction model applicable for any unseen software project or projects with little historical data. Our proposed approach can successfully predict the objective and priority level of issue reports with 82% and 75% accuracy, respectively. Moreover, we conducted a user study on unlabeled issues from six unseen GitHub projects to assess the performance of the cross-project model on new data. The model achieves 90% accuracy on the sample set.

翻译：GitHub 等软件库存放大量软件实体。开发者协作讨论、实施、使用和分享这些实体。适当的文件在成功的软件管理和维护中发挥重要作用。用户利用软件库设施“ 问题跟踪系统”, 跟踪问题报告, 管理工作量和流程, 最后记录团队工作的重点。问题报告是合作整理软件知识的丰富来源, 可能包含报告的问题、请求新功能, 或仅包含关于软件产品的问题。随着这些问题数量的增加, 手工管理这些问题变得更加困难。 GitHub 为标签问题提供了标签, 作为一种问题管理手段。然而, 大约一半的GitHub 软件库问题都没有任何标签。在这项工作中, 我们的目标是将管理软件团队问题报告的自动化进程自动化。我们提出一个两阶段模式, 预测在打开问题的背后的目标, 以及其优先级别上, 使用特别工程方法和最先进的文本分类。我们用一个小的通用的模型, 来测试我们的Girob 和软件项目中, 任何基于历史优先级的预估项目中, 都提供了一个小的预估测点。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Effective.Modern.C++ 中英文版，334页pdf

专知会员服务

68+阅读 · 2020年11月4日

【机器学习工具箱(机器学习实用库分类大列表)】《Machine Learning Toolbox》by Amit Chaudhary

专知会员服务

30+阅读 · 2020年7月12日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日