是什么制造了这个测试花? 指定对测试火花负责的类 (What Made This Test Flake? Pinpointing Classes Responsible for Test Flakiness)

Flaky tests are defined as tests that manifest non-deterministic behaviour by passing and failing intermittently for the same version of the code. These tests cripple continuous integration with false alerts that waste developers' time and break their trust in regression testing. To mitigate the effects of flakiness, both researchers and industrial experts proposed strategies and tools to detect and isolate flaky tests. However, flaky tests are rarely fixed as developers struggle to localise and understand their causes. Additionally, developers working with large codebases often need to know the sources of non-determinism to preserve code quality, i.e., avoid introducing technical debt linked with non-deterministic behaviour, and to avoid introducing new flaky tests. To aid with these tasks, we propose re-targeting Fault Localisation techniques to the flaky component localisation problem, i.e., pinpointing program classes that cause the non-deterministic behaviour of flaky tests. In particular, we employ Spectrum-Based Fault Localisation (SBFL), a coverage-based fault localisation technique commonly adopted for its simplicity and effectiveness. We also utilise other data sources, such as change history and static code metrics, to further improve the localisation. Our results show that augmenting SBFL with change and code metrics ranks flaky classes in the top-1 and top-5 suggestions, in 26% and 47% of the cases. Overall, we successfully reduced the average number of classes inspected to locate the first flaky class to 19% of the total number of classes covered by flaky tests. Our results also show that localisation methods are effective in major flakiness categories, such as concurrency and asynchronous waits, indicating their general ability to identify flaky components.

翻译：闪烁测试被定义为显示非决定性行为的测试,通过通过和断断断断续续的代码版本。这些测试会破坏连续整合,以虚假的提醒来破坏开发者的时间和在回归测试中失去信任。为了减轻迷幻效应的影响,研究人员和工业专家都提出了检测和分离片片段测试的战略和工具。但是,当开发者为确定和理解其原因而挣扎时,闪烁测试很少被固定下来。此外,与大代码库合作的开发者往往需要了解非确定性的来源,以保持代码质量,即避免引入与非确定性行为相关的技术债务,避免引入新的确定性测试。为了帮助完成这些任务,我们建议重新将断裂本地化技术定位到模糊部分的本地化问题,例如,当开发程序类别时,当开发基于Spectrum-Bloral的本地化(SBFL) (SBFL) (SBFFL) (SBFlF) () (一个基于覆盖的本地直径直径直径直径数, ) (一种基于覆盖的本地化技术,通常用于其总体简单和有效性的分类, 和有效性的本地级。我们也利用其他数据源值级的本地级的本地级的分类, 显示SBFBFlationalalalalal) 和的排序的排序, 显示其历史和的排序的系统化方法, 当显示的平序级, 显示的排序的排序的系统的排序的排序。