In both commercial and open-source software, bug reports or issues are used to track bugs or feature requests. However, the quality of issues can differ a lot. Prior research has found that bug reports with good quality tend to gain more attention than the ones with poor quality. As an essential component of an issue, title quality is an important aspect of issue quality. Moreover, issues are usually presented in a list view, where only the issue title and some metadata are present. In this case, a concise and accurate title is crucial for readers to grasp the general concept of the issue and facilitate the issue triaging. Previous work formulated the issue title generation task as a one-sentence summarization task. A sequence-to-sequence model was employed to solve this task. However, it requires a large amount of domain-specific training data to attain good performance in issue title generation. Recently, pre-trained models, which learned knowledge from large-scale general corpora, have shown much success in software engineering tasks. In this work, we make the first attempt to fine-tune BART, which has been pre-trained using English corpora, to generate issue titles. We implemented the fine-tuned BART as a web tool named iTiger, which can suggest an issue title based on the issue description. iTiger is fine-tuned on 267,094 GitHub issues. We compared iTiger with the state-of-the-art method, i.e., iTAPE, on 33,438 issues. The automatic evaluation shows that iTiger outperforms iTAPE by 29.7%, 50.8%, and 34.1%, in terms of ROUGE-1, ROUGE-2, ROUGE-L F1-scores. The manual evaluation also demonstrates the titles generated by BART are preferred by evaluators over the titles generated by iTAPE in 72.7% of cases. Besides, the evaluators deem our tool as useful and easy-to-use. They are also interested to use our tool in the future.
翻译:在商业和开放源码软件中,错误报告或问题被用来跟踪错误或特性请求。 然而, 问题的质量可能差异很大。 先前的研究发现, 质量好的错误报告会比质量差的错误报告得到更多的注意。 但是, 标题质量是问题的一个重要部分。 此外, 通常在列表视图中出现问题, 只有问题标题和一些元数据存在。 在此情况下, 简洁和准确的标题对于读者了解问题的一般概念和促进问题三角至关重要。 先前的工作将问题标题生成任务设计成一则一言词和合成任务。 使用一个序列到序列的模型来解决这个问题。 然而, 它需要大量特定域的培训数据才能在发行标题生成中取得良好业绩。 最近, 接受过训练的模型在大型的普通星体中学习了72项知识, 显示在软件工程任务中非常成功。 我们第一次尝试通过微调 BART, 未来, 已经用英语的 ROGOE 进行精细的 ROT, 将 iHR 升级到 。