Regression testing is an important phase to deliver software with quality. However, flaky tests hamper the evaluation of test results and can increase costs. This is because a flaky test may pass or fail non-deterministically and to identify properly the flakiness of a test requires rerunning the test suite multiple times. To cope with this challenge, approaches have been proposed based on prediction models and machine learning. Existing approaches based on the use of the test case vocabulary may be context-sensitive and prone to overfitting, presenting low performance when executed in a cross-project scenario. To overcome these limitations, we investigate the use of test smells as predictors of flaky tests. We conducted an empirical study to understand if test smells have good performance as a classifier to predict the flakiness in the cross-project context, and analyzed the information gain of each test smell. We also compared the test smell-based approach with the vocabulary-based one. As a result, we obtained a classifier that had a reasonable performance (Random Forest, 0.83) to predict the flakiness in the testing phase. This classifier presented better performance than vocabulary-based model for cross-project prediction. The Assertion Roulette and Sleepy Test test smell types are the ones associated with the best information gain values.
翻译:反向测试是交付高质量软件的一个重要阶段。然而,反向测试阻碍对测试结果的评估,并可能增加成本。这是因为单向测试可能非决定性地通过或失败,而且要正确确定测试的不成熟性,就需要多次重新运行测试套件。为了应对这一挑战,已经根据预测模型和机器学习提出了方法。基于使用测试案例词汇的现有方法可能具有背景敏感性,容易过度配置,在跨项目情景中执行时表现不佳。为了克服这些限制,我们调查测试闻觉的味道作为片面测试预测器的使用。我们进行了一项实验性研究,以了解测试品味作为跨项目背景下预测不成熟性能的分类器是否表现良好,并分析每次测试气味的信息收益。我们还将测试嗅觉方法与基于词汇的方法进行了比较。结果是,我们获得了一个具有合理性能的分类器(Randem Forest, 0.83),以预测测试阶段的耐受性性性。这个分类器比基于词汇的模型的性能好,用于跨项目预测。测试型号是最佳测试。