预测软件储存库问题报告的目标和重点 (Predicting the Objective and Priority of Issue Reports in Software Repositories)

Developers collaboratively discuss, implement, use, and share software entities hosted on software repositories. Proper documentation plays an important role in successful software management and maintenance. Users exploit Issue Tracking Systems, a facility of software repositories, to keep track of issue reports, to manage the workload and processes, and finally, to document the highlight of their team's effort. An issue report is a rich source of collaboratively curated software knowledge, and can contain a reported problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. GitHub provides labels for tagging issues, as a means of issue management. However, about half of the issues in GitHub's top 1000 repositories do not have any labels. We aim at automating the process of managing issue reports for software teams. We propose a two-stage approach to predict both the objective behind opening an issue and its priority level using feature engineering methods and state-of-the-art text classifiers. To the best of our knowledge, we are the first to fine-tune a Transformer for issue classification. We train and evaluate our models in both project-based and cross-project settings. The latter approach provides a generic prediction model applicable for any unseen software project or projects with little historical data. Our proposed approach can successfully predict the objective and priority level of issue reports with 82% and 75% accuracy, respectively. Moreover, we conducted human labeling and evaluation on unlabeled issues from six unseen GitHub projects to assess the performance of the cross-project model on new data. The model achieves 90% accuracy. We obtain 85% average Percent Agreement and 71% Randolph's free-marginal Kappa translating to substantial agreement among labelers.

翻译：开发者协作讨论、实施、使用和共享软件库托管的软件实体。正确的文件在成功的软件管理和维护中起着重要作用。用户利用软件库的设施“ 问题跟踪系统” 来跟踪问题报告, 管理工作量和程序, 最后, 记录团队工作的亮点。问题报告是合作整理软件知识的丰富来源, 可以包含报告的问题, 请求新功能, 或仅包含软件产品问题。随着这些问题数量的增加, 手工管理这些问题变得更加困难。 GitHub 提供标签问题标签, 作为一种问题管理工具。然而, 大约一半的GitHub 头1 000 储存库问题没有标签, 以跟踪问题报告, 以跟踪问题报告, 管理软件团队问题报告的亮点。我们提出一个两阶段的方法来预测问题背后的目标, 启动一个问题, 请求新的功能, 发行者使用基于现状的标签, 交叉版本的文本分类。至我们的知识中, 我们首先对问题进行精细的标签标签标签,, 作为一种问题管理手段。但是,, 将一个变换一个变型的服务器,,, 运行一个历史数据协议

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/