Developers often opt for easier but non-optimal implementation to meet deadlines or create rapid prototypes, leading to additional effort known as technical debt to improve the code later. Oftentimes, developers explicitly document the technical debt in code comments, referred to as Self-Admitted Technical Debt (SATD). Numerous researchers have investigated the impact of SATD on different aspects of software quality and development processes. However, most of these studies focus on SATD in production code, often overlooking SATD in the test code or assuming that it shares similar characteristics with SATD in production code. In fact, a significant amount of SATD is also present in the test code, with many instances not fitting into existing categories for the production code. This study aims to fill this gap and disclose the nature of SATD in the test code by examining its distribution and types. Moreover, the relation between its presence and test quality is also analyzed. Our empirical study, involving 17,766 SATD comments (14,987 from production code, 2,779 from test code) collected from 50 repositories, demonstrates that while SATD widely exists in test code, it is not directly associated with test smells. Our study also presents comprehensive categories of SATD types in the test code, and machine learning models are developed to automatically classify SATD comments based on their types for easier management. Our results show that the CodeBERT-based model outperforms other machine learning models in terms of recall and F1-score. However, the performance varies on different types of SATD.
翻译:开发人员常为赶工期或快速创建原型而选择更简单但非最优的实现方式,这导致后续需要额外工作量来改进代码,即所谓技术债务。开发人员经常在代码注释中明确记录此类技术债务,称为自认技术债务(SATD)。众多研究者已探讨SATD对软件质量与开发流程各方面的影响,但这些研究大多聚焦于生产代码中的SATD,往往忽视测试代码中的SATD,或默认其与生产代码中的SATD具有相似特征。实际上,测试代码中也存在大量SATD,且许多实例无法归入现有生产代码的分类体系。本研究旨在填补这一空白,通过考察测试代码中SATD的分布与类型来揭示其本质特征,并分析其存在与测试质量之间的关联。我们对50个代码库中收集的17,766条SATD注释(14,987条来自生产代码,2,779条来自测试代码)展开实证研究,结果表明:虽然SATD在测试代码中广泛存在,但其与测试异味并无直接关联。本研究还系统提出了测试代码中SATD类型的完整分类体系,并开发了基于机器学习的模型以根据类型自动分类SATD注释,便于管理。实验结果显示,基于CodeBERT的模型在召回率与F1分数上优于其他机器学习模型,但其性能在不同SATD类型上存在差异。