Technical debt refers to taking shortcuts to achieve short-term goals while sacrificing the long-term maintainability and evolvability of software systems. A large part of technical debt is explicitly reported by the developers themselves; this is commonly referred to as Self-Admitted Technical Debt or SATD. Previous work has focused on identifying SATD from source code comments and issue trackers. However, there are no approaches available for automatically identifying SATD from other sources such as commit messages and pull requests, or by combining multiple sources. Therefore, we propose and evaluate an approach for automated SATD identification that integrates four sources: source code comments, commit messages, pull requests, and issue tracking systems. Our findings show that our approach outperforms baseline approaches and achieves an average F1-score of 0.611 when detecting four types of SATD (i.e., code/design debt, requirement debt, documentation debt, and test debt) from the four aforementioned sources. Thereafter, we analyze 23.6M code comments, 1.3M commit messages, 3.7M issue sections, and 1.7M pull request sections to characterize SATD in 103 open-source projects. Furthermore, we investigate the SATD keywords and relations between SATD in different sources. The findings indicate, among others, that: 1) SATD is evenly spread among all sources; 2) issues and pull requests are the two most similar sources regarding the number of shared SATD keywords, followed by commit messages, and then followed by code comments; 3) there are four kinds of relations between SATD items in the different sources.
翻译:技术债务是指采取捷径来实现短期目标,同时牺牲软件系统的长期可维持性和可变性。开发商自己明确报告了很大一部分技术债务;这通常称为自发技术债务或SATD。以前的工作重点是从源代码评论和发行跟踪器中查明SATD。然而,没有办法自动确定其他来源的SATD,如发出电文和拉动请求,或合并多种来源。因此,我们提议并评价一种自动化SATD识别方法,该方法将四种来源:源代码评论、提供信息、拉动请求和发布跟踪系统结合起来。我们的调查结果显示,我们的方法优于基线方法,在从源代码评论和发行跟踪器中发现四种类型的SATD时达到平均F1核心(即代码/设计债务、要求债务、文件债务和测试债务)。随后,我们分析了23.6M代码评论、1.3M承诺信息、3.7M问题部分和1.7M请求部分,以便在103个公开源项目中描述SATD的特征。此外,我们调查SATD的四种不同关键数据来源和两个共同关键数据,然后是SATD中不同的关键数据。