Flaky tests (tests with non-deterministic outcomes) pose a major challenge for software testing. They are known to cause significant issues such as reducing the effectiveness and efficiency of testing and delaying software releases. In recent years, there has been an increased interest in flaky tests, with research focusing on different aspects of flakiness, such as identifying causes, detection methods and mitigation strategies. Test flakiness has also become a key discussion point for practitioners (in blog posts, technical magazines, etc.) as the impact of flaky tests is felt across the industry. This paper presents a multivocal review that investigates how flaky tests, as a topic, have been addressed in both research and practice. We cover a total of 651 articles (560 academic articles and 91 grey literature articles/posts), and structure the body of relevant research and knowledge using four different dimensions: causes, detection, impact and responses. For each of those dimensions we provide a categorisation, and classify existing research, discussions, methods and tools. With this, we provide a comprehensive and current snapshot of existing thinking on test flakiness, covering both academic views and industrial practices, and identify limitations and opportunities for future research.
翻译: Flaky 测试(非确定性结果的测试)对软件测试构成重大挑战,据了解,这些测试会造成重大问题,如降低测试和推迟软件发布的效果和效率;近年来,对片面测试的兴趣日益浓厚,研究侧重于不透明的不同方面,如查明原因、检测方法和缓解战略;测试不成熟也成为从业人员(在博客文章、技术杂志等)的主要讨论点,因为片面测试的影响在整个行业都感受到。本文是一个多层次的审查,调查了研究和实践中如何作为一个专题处理闪烁性测试的问题。我们共涵盖651篇文章(560篇学术文章和91篇灰色文献文章/文章),利用四个不同方面,即原因、检测、影响和反应,构建相关研究和知识的主体。我们为这些方面中的每一个方面提供分类,并对现有的研究、讨论、方法和工具进行分类。我们通过这一方式,对测试是否准确性的现有思维提供了全面而现时的概况,涵盖学术观点和工业实践,并确定未来研究的局限性和机会。