Background: Machine learning (ML) may enable effective automated test generation. Aims: We characterize emerging research, examining testing practices, researcher goals, ML techniques applied, evaluation, and challenges. Method: We perform a systematic literature review on a sample of 97 publications. Results: ML generates input for system, GUI, unit, performance, and combinatorial testing or improves the performance of existing generation methods. ML is also used to generate test verdicts, property-based, and expected output oracles. Supervised learning - often based on neural networks - and reinforcement learning - often based on Q-learning - are common, and some publications also employ unsupervised or semi-supervised learning. (Semi-/Un-)Supervised approaches are evaluated using both traditional testing metrics and ML-related metrics (e.g., accuracy), while reinforcement learning is often evaluated using testing metrics tied to the reward function. Conclusions: Work-to-date shows great promise, but there are open challenges regarding training data, retraining, scalability, evaluation complexity, ML algorithms employed - and how they are applied - benchmarks, and replicability. Our findings can serve as a roadmap and inspiration for researchers in this field.
翻译:目标:我们描述新出现的研究,审查测试做法、研究人员的目标、ML技术的应用、评价和挑战。方法:我们对97种出版物样本进行系统的文献审查。结果:ML为系统、GUI、单位、性能和组合测试提供投入,或改进现有一代方法的性能。ML还被用来产生测试判断、基于财产和预期产出或触手法。监督学习——常常以神经网络为基础——和强化学习——往往以Q学习为基础——是常见的,有些出版物也采用不受监督或半监督的学习。 (Semi-/Un-)超常方法是使用传统测试指标和ML相关衡量标准(例如,准确性)进行评估的,而强化学习往往使用与奖励功能挂钩的测试标准进行评估。结论:工作到现在显示出很大的希望,但在培训数据、再培训、可缩放性、评估复杂性、ML算法方面,存在公开的挑战,有些出版物也采用了不受监督或半监督的学习方法。(Sem-Un-Un-Un-)超常采用的方法,同时使用传统的测试和复制方法作为实地研究的检验标准。