Test flakiness is a problem that affects testing and processes that rely on it. Several factors cause or influence the flakiness of test outcomes. Test execution order, randomness and concurrency are some of the more common and well-studied causes. Some studies mention code instrumentation as a factor that causes or affects test flakiness. However, evidence for this issue is scarce. In this study, we attempt to systematically collect evidence for the effects of instrumentation on test flakiness. We experiment with common types of instrumentation for Java programs - namely, application performance monitoring, coverage and profiling instrumentation. We then study the effects of instrumentation on a set of nine programs obtained from an existing dataset used to study test flakiness, consisting of popular GitHub projects written in Java. We observe cases where real-world instrumentation causes flakiness in a program. However, this effect is rare. We also discuss a related issue - how instrumentation may interfere with flakiness detection and prevention.
翻译:测试回归可靠性的问题会影响测试和依赖测试的流程。多种因素会导致或影响测试结果的可靠性。测试执行顺序、随机性和并发是一些比较常见并被广泛研究的原因。一些研究提到代码器具可能是导致或影响测试结果可靠性的一个因素。然而,有关此问题的证据很少。本文试图系统地收集关于器具对测试回归可靠性影响的证据。我们实验了常见的Java程序器具类型,包括应用程序性能监控、覆盖率和分析器具。我们接着研究了这些器具对九个使用已有数据集中受欢迎的GitHub Java项目的影响。我们观察到现实生活中器具有时确实会导致程序出现可靠性问题。但是,这种情况很少见。我们还讨论了一个相关问题,即器具如何干扰可靠性检测和预防。