开放源码软件的标签回收和预测 (Issue Link Label Recovery and Prediction for Open Source Software)

Modern open source software development heavily relies on the issue tracking systems to manage their feature requests, bug reports, tasks, and other similar artifacts. Together, those "issues" form a complex network with links to each other. The heterogeneous character of issues inherently results in varied link types and therefore poses a great challenge for users to create and maintain the label of the link manually. The goal of most existing automated issue link construction techniques ceases with only examining the existence of links between issues. In this work, we focus on the next important question of whether we can assess the type of issue link automatically through a data-driven method. We analyze the links between issues and their labels used the issue tracking system for 66 open source projects. Using three projects, we demonstrate promising results when using supervised machine learning classification for the task of link label recovery with careful model selection and tuning, achieving F1 scores of between 0.56-0.70 for the three studied projects. Further, the performance of our method for future link label prediction is convincing when there is sufficient historical data. Our work signifies the first step in systematically manage and maintain issue links faced in practice.

翻译：现代开放源码软件开发在很大程度上依赖问题跟踪系统来管理其特性请求、错误报告、任务和其他类似文物。这些“问题”共同形成一个复杂的网络,彼此连接。问题的多样性在各种链接类型中产生内在的结果,因此给用户带来巨大的挑战,以手工创建和维护链接标签。大多数现有自动化问题连接工程的目标止于仅审查问题之间是否存在联系。在这项工作中,我们侧重于下一个重要问题,即我们是否能够通过数据驱动的方法自动评估问题链接的类型。我们分析了问题与其标签之间的联系,并使用了66个开放源码项目的问题跟踪系统。我们使用三个项目,在使用监督的机器学习分类,在使用仔细的模型选择和调整的链接标签恢复任务中,我们展示了有希望的结果,在三个研究的项目中实现了0.56至0.70分的F1分。此外,如果有足够的历史数据,我们未来标签预测方法的性能是令人信服的。我们的工作标志了系统管理和保持实践中所面临的问题链接的第一步。