Context: Machine learning (ML) may enable effective automated test generation. Objective: We characterize emerging research, examining testing practices, researcher goals, ML techniques applied, evaluation, and challenges. Methods: We perform a systematic mapping on a sample of 102 publications. Results: ML generates input for system, GUI, unit, performance, and combinatorial testing or improves the performance of existing generation methods. ML is also used to generate test verdicts, property-based, and expected output oracles. Supervised learning - often based on neural networks - and reinforcement learning - often based on Q-learning - are common, and some publications also employ unsupervised or semi-supervised learning. (Semi-/Un-)Supervised approaches are evaluated using both traditional testing metrics and ML-related metrics (e.g., accuracy), while reinforcement learning is often evaluated using testing metrics tied to the reward function. Conclusion: Work-to-date shows great promise, but there are open challenges regarding training data, retraining, scalability, evaluation complexity, ML algorithms employed - and how they are applied - benchmarks, and replicability. Our findings can serve as a roadmap and inspiration for researchers in this field.
翻译:目标:我们确定新兴研究的特点,审查测试做法、研究人员目标、应用的ML技术、评价和挑战。方法:我们对102种出版物的抽样进行系统测绘;结果:ML为系统、图形用户、单位、性能和组合测试提供投入,或改进现有生成方法的性能。 ML还被用来产生测试判断、基于财产和预期产出或触手法。监督学习----常常基于神经网络----和强化学习----往往基于Q-学习----是常见的,一些出版物也采用不受监督或半监督的学习。 (中/非)超常方法通过传统测试指标和与ML有关的衡量标准(例如,准确性)得到评价,而强化学习则经常使用与奖励功能挂钩的测试标准进行评估。结论:工作到今天显示出很大的希望,但在培训数据、再培训、可扩展性、评估复杂性、ML算法方面,存在公开的挑战,在使用这些方法进行不受监督或半监督的学习。 (中/非)超常使用的方法是使用传统测试指标和与ML有关的衡量标准(例如,准确性),同时评估强化学习如何作为研究人员和实地激励标准。