A vigorous and growing set of technical debt analysis tools have been developed in recent years -- both research tools and industrial products -- such as Structure 101, SonarQube, and DV8. Each of these tools identifies problematic files using their own definitions and measures. But to what extent do these tools agree with each other in terms of the files that they identify as problematic? If the top-ranked files reported by these tools are largely consistent, then we can be confident in using any of these tools. Otherwise, a problem of accuracy arises. In this paper, we report the results of an empirical study analyzing 10 projects using multiple tools. Our results show that: 1) these tools report very different results even for the most common measures, such as size, complexity, file cycles, and package cycles. 2) These tools also differ dramatically in terms of the set of problematic files they identify, since each implements its own definitions of "problematic". After normalizing by size, the most problematic file sets that the tools identify barely overlap. 3) Our results show that code-based measures, other than size and complexity, do not even moderately correlate with a file's change-proneness or error-proneness. In contrast, co-change-related measures performed better. Our results suggest that, to identify files with true technical debt -- those that experience excessive changes or bugs -- co-change information must be considered. Code-based measures are largely ineffective at pinpointing true debt. Finally, this study reveals the need for the community to create benchmarks and data sets to assess the accuracy of software analysis tools in terms of commonly used measures.
翻译:近些年来,开发了一套强有力和不断增长的技术性债务分析工具 -- -- 包括研究工具和工业产品 -- -- 如结构101、SonarQube和DV8.。 这些工具中,每个工具都用自己的定义和计量方法来识别有问题的文件。但是,这些工具在哪些方面彼此一致?如果这些工具报告的排名最靠前的文件基本一致,那么我们就能有信心使用这些工具中的任何一种工具。否则,就会出现一个精确度问题。在本文件中,我们用多种工具来报告分析10个项目的经验性研究的结果。我们的结果显示:(1)这些工具报告的结果非常不同,甚至用最常用的计量方法,如大小、复杂度、文件周期和包周期等。(2) 这些工具在所识别的一组有问题的文档中,在多大程度上彼此一致一致一致?如果这些工具采用自己的“问题”定义,那么我们就可以有信心使用其中最麻烦的文档组合,工具几乎不能重叠。(3)我们的结果显示,除大小和复杂性之外,基于代码的措施与文件的易变度或错误性工具没有多少相关联性。在评估最常见的尺度上,那么,在进行这种精确性分析时,必须用到精确性分析。对比,我们所使用的数据分析时必须用到与精确性分析。 。对比性分析结果,用这种精确性分析方法来显示我们所使用的数据。