Developers sometimes choose design and implementation shortcuts due to the pressure from tight release schedules. However, shortcuts introduce technical debt that increases as the software evolves. The debt needs to be repaid as fast as possible to minimize its impact on software development and software quality. Sometimes, technical debt is admitted by developers in comments and commit messages. Such debt is known as self-admitted technical debt (SATD). In data-intensive systems, where data manipulation is a critical functionality, the presence of SATD in the data access logic could seriously harm performance and maintainability. Understanding the composition and distribution of the SATDs across software systems and their evolution could provide insights into managing technical debt efficiently. We present a large-scale empirical study on the prevalence, composition, and evolution of SATD in data-intensive systems. We analyzed 83 open-source systems relying on relational databases as well as 19 systems relying on NoSQL databases. We detected SATD in source code comments obtained from different snapshots of the subject systems. To understand the evolution dynamics of SATDs, we conducted a survival analysis. Next, we performed a manual analysis of 361 sample data-access SATDs, investigating the composition of data-access SATDs and the reasons behind their introduction and removal. We identified 15 new SATD categories, out of which 11 are specific to database access operations. We found that most of the data-access SATDs are introduced in the later stages of change history rather than at the beginning. We also observed that bug fixing and refactoring are the main reasons behind the introduction of data-access SATDs.
翻译:开发者有时会选择设计与执行的捷径,因为来自紧凑的放行时间表的压力。然而,捷径会引入随着软件的发展而增加的技术债务。债务需要尽快偿还,以尽量减少其对软件开发和软件质量的影响。有时,技术债务由开发者通过评论和承诺信息承认。这类债务被称为自我承认的技术债务(SATD)。在数据密集系统中,数据操纵是一个关键功能,数据访问逻辑中存在SATD会严重损害数据访问逻辑的性能和可维持性。了解软件系统之间SATD的构成和分布及其演变,可以提供对技术债务有效管理的洞察力。我们对数据密集系统中SATD的普遍程度、组成和演变情况进行了大规模的经验研究。我们分析了83个公开源系统,依赖关系数据库以及19个依靠NOSQL数据库的系统。我们在数据输入源代码中检测了从主题系统不同缩影中获取的意见。为了了解SATD的演变动态,我们进行了生存分析。接下来,我们对SATD系统进入了361的抽样数据获取情况进行了人工分析,而数据进入了数据进入的数据进入后,我们发现STD的大多数数据进入了SATD的主要阶段。