Technical debt is a metaphor indicating sub-optimal solutions implemented for short-term benefits by sacrificing the long-term maintainability and evolvability of software. A special type of technical debt is explicitly admitted by software engineers (e.g. using a TODO comment); this is called Self-Admitted Technical Debt or SATD. Most work on automatically identifying SATD focuses on source code comments. In addition to source code comments, issue tracking systems have shown to be another rich source of SATD, but there are no approaches specifically for automatically identifying SATD in issues. In this paper, we first create a training dataset by collecting and manually analyzing 4,200 issues (that break down to 23,180 sections of issues) from seven open-source projects (i.e., Camel, Chromium, Gerrit, Hadoop, HBase, Impala, and Thrift) using two popular issue tracking systems (i.e., Jira and Google Monorail). We then propose and optimize an approach for automatically identifying SATD in issue tracking systems using machine learning. Our findings indicate that: 1) our approach outperforms baseline approaches by a wide margin with regard to the F1-score; 2) transferring knowledge from suitable datasets can improve the predictive performance of our approach; 3) extracted SATD keywords are intuitive and potentially indicating types and indicators of SATD; 4) projects using different issue tracking systems have less common SATD keywords compared to projects using the same issue tracking system; 5) a small amount of training data is needed to achieve good accuracy.
翻译:技术债务是一种隐喻,它表明为短期利益而采用的次最佳解决办法是牺牲软件的长期可维持性和可变性,从而牺牲软件的长期可维持性和可变性。软件工程师明确承认了一种特殊类型的技术债务(例如使用TODO评论);这称为自我承认的技术债务或SATD。关于自动确定SATD的大部分工作侧重于源代码评论。除了源代码评论外,问题追踪系统已证明是SATD的另一个丰富的来源,但是没有具体的方法自动确定问题中的SATD。在本文件中,我们首先通过收集和手动分析从七个开放源项目(例如,Camel、Chromium、Gerrit、Hadoop、HBase、Impala和Trift)(使用两种流行的问题跟踪系统(即Jira和Goog Onoril)中4 200个问题(即自动确定问题跟踪系统)。我们发现并优化了一种方法,即:1)我们的方法超越了4,180个问题部分的准确性(问题分解到23,180个问题部分),从七个开放源项目(即Camed Studio)、Geritalal track Syalaltals)的基线方法可以改进SAliflistal Stal 4,从Sildal Stildaldaldaldalddals 4号到Sildaldals 。我们发现从利用了SAGlievaldaldaldaldaldalddaldaldaldaldaldal 4号,从利用了SBildaldaldaldalddaldaldaldaldddals 4号的系统, 4号的基线分析了S2,从一个普通数据流数据流数据流数据流分析了S1,在S1到S1, 4,可以改进了S2,在S4, 4号, 4,从利用了S4号的精确基数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流数据流,可以改进数据流数据流数据