Labeling issues with the skills required to complete them can help contributors to choose tasks in Open Source Software projects. However, manually labeling issues is time-consuming and error-prone, and current automated approaches are mostly limited to classifying issues as bugs/non-bugs. We investigate the feasibility and relevance of automatically labeling issues with what we call "API-domains," which are high-level categories of APIs. Therefore, we posit that the APIs used in the source code affected by an issue can be a proxy for the type of skills (e.g., DB, security, UI) needed to work on the issue. We ran a user study (n=74) to assess API-domain labels' relevancy to potential contributors, leveraged the issues' descriptions and the project history to build prediction models, and validated the predictions with contributors (n=20) of the projects. Our results show that (i) newcomers to the project consider API-domain labels useful in choosing tasks, (ii) labels can be predicted with a precision of 84% and a recall of 78.6% on average, (iii) the results of the predictions reached up to 71.3% in precision and 52.5% in recall when training with a project and testing in another (transfer learning), and (iv) project contributors consider most of the predictions helpful in identifying needed skills. These findings suggest our approach can be applied in practice to automatically label issues, assisting developers in finding tasks that better match their skills.
翻译:在开源软件项目中,为问题打上需要完成它们的技能标签可以帮助贡献者选择任务。然而,手动标记问题是耗时且容易出错的,目前自动化方法大都局限于将问题分类为缺陷/非缺陷。我们研究了自动为问题打API领域标签的可行性和相关性,这些标签是API的高级分类。因此,我们认为受问题影响的源代码中使用的API可以作为工作问题所需技能类型的代理。我们进行了一项用户研究(n = 74),以评估API领域标签对潜在贡献者的重要性,利用问题描述和项目历史记录构建预测模型,并通过项目的贡献者(n = 20)验证了预测结果。我们的结果表明,(i)项目新手认为API领域标签对选择任务有用,(ii)标签可以平均预测精度达到84%和召回率为78.6%,(iii)当在一个项目中训练并在另一个项目中测试时(迁移学习),预测的结果可达到71.3%的精度和52.5%的召回率,(iv)项目的贡献者认为大多数预测有助于确定所需技能。这些发现表明我们的方法可以应用于实践中自动标记问题,帮助开发人员找到更符合其技能的任务。