Context: Machine learning (ML) may enable effective automated test generation. Objective: We characterize emerging research, examining testing practices, researcher goals, ML techniques applied, evaluation, and challenges. Methods: We perform a systematic literature review on a sample of 97 publications. Results: ML generates input for system, GUI, unit, performance, and combinatorial testing or improves the performance of existing generation methods. ML is also used to generate test verdicts, property-based, and expected output oracles. Supervised learning - often based on neural networks - and reinforcement learning - often based on Q-learning - are common, and some publications also employ unsupervised or semi-supervised learning. (Semi-/Un-)Supervised approaches are evaluated using both traditional testing metrics and ML-related metrics (e.g., accuracy), while reinforcement learning is often evaluated using testing metrics tied to the reward function. Conclusion: Work-to-date shows great promise, but there are open challenges regarding training data, retraining, scalability, evaluation complexity, ML algorithms employed - and how they are applied - benchmarks, and replicability. Our findings can serve as a roadmap and inspiration for researchers in this field.
翻译:目标:我们确定新兴研究的特点,审查测试做法、研究人员的目标、应用的ML技术、评价和挑战。方法:我们对97种出版物的抽样进行系统的文献审查。结果:ML为系统、GUI、单位、性能和组合测试提供投入,或改进现有一代方法的性能。ML还被用来产生测试判断、基于财产和预期产出或触手。监督学习——往往基于神经网络——和加强学习——往往基于Q-学习——是常见的,有些出版物也采用未经监督或半监督的学习。 (中/非)超常方法是利用传统测试指标和与ML相关的衡量标准(例如,准确性)进行评估,而强化学习常常使用与奖励功能挂钩的测试标准进行评估。结论:工作到现在显示很大的希望,但在培训数据、再培训、可缩放性、评估复杂性、ML算法方面,存在公开的挑战,在应用这些方法时,可以作为实地研究者的基准和可如何运用。