Jira公共储存库另类问题跟踪数据集 (An Alternative Issue Tracking Dataset of Public Jira Repositories)

Organisations use issue tracking systems (ITSs) to track and document their projects' work in units called issues. This style of documentation encourages evolutionary refinement, as each issue can be independently improved, commented on, linked to other issues, and progressed through the organisational workflow. Commonly studied ITSs so far include GitHub, GitLab, and Bugzilla, while Jira, one of the most popular ITS in practice with a wealth of additional information, has yet to receive such attention. Unfortunately, diverse public Jira datasets are rare, likely due to the difficulty in finding and accessing these repositories. With this paper, we release a dataset of 16 public Jiras with 1822 projects, spanning 2.7 million issues with a combined total of 32 million changes, 9 million comments, and 1 million issue links. We believe this Jira dataset will lead to many fruitful research projects investigating issue evolution, issue linking, cross-project analysis, as well as cross-tool analysis when combined with existing well-studied ITS datasets.

翻译：各组织使用问题跟踪系统(ITS)跟踪和记录其项目在被称为问题的单位中的工作。这种文件风格鼓励逐步完善,因为每个问题都可以独立地改进、评论、与其他问题相联系,并通过组织工作流程取得进展。迄今为止,共同研究的ITS包括GitHub、GitLab和Bugzilla,而Jira是实际中最受欢迎的ITS,拥有大量额外信息,尚未得到这样的注意。不幸的是,由于很难找到和访问这些储存库,不同的公众Jira数据集非常少见。有了这份文件,我们发布了16个公众Jiras的数据集,共有1822个项目,涉及270万个问题,总共涉及3 200万个变化、900万个评论和100万个问题链接。我们认为,这一Jira数据集将导致许多富有成果的研究项目,调查问题演变、问题连接、跨项目分析以及交叉工具分析,如果与现有的ITS数据集相结合。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日