Testing Deep Learning (DL) systems is a complex task as they do not behave like traditional systems would, notably because of their stochastic nature. Nonetheless, being able to adapt existing testing techniques such as Mutation Testing (MT) to DL settings would greatly improve their potential verifiability. While some efforts have been made to extend MT to the Supervised Learning paradigm, little work has gone into extending it to Reinforcement Learning (RL) which is also an important component of the DL ecosystem but behaves very differently from SL. This paper builds on the existing approach of MT in order to propose a framework, RLMutation, for MT applied to RL. Notably, we use existing taxonomies of faults to build a set of mutation operators relevant to RL and use a simple heuristic to generate test cases for RL. This allows us to compare different mutation killing definitions based on existing approaches, as well as to analyze the behavior of the obtained mutation operators and their potential combinations called Higher Order Mutation(s) (HOM). We show that the design choice of the mutation killing definition can affect whether or not a mutation is killed as well as the generated test cases. Moreover, we found that even with a relatively small number of test cases and operators we manage to generate HOM with interesting properties which can enhance testing capability in RL systems.
翻译:深学习(DL)测试系统是一项复杂的任务,因为它们的行为方式不像传统系统那样,特别是因为它们具有随机性。然而,如果能够将现有的测试技术(如变异测试(MT))改造到DL设置中,将极大地提高它们的潜在可核查性。虽然已作出一些努力将MT扩展至监督学习范式,但将MT扩展至强化学习(RL)系统(RL)的工作很少,这也是DL生态系统的一个重要组成部分,但行为与SL非常不同。本文以MT的现有方法为基础,为MT提出一个框架,即RLMutation(RLMutation),用于RL。特别是,我们利用现有的断层分类法来建立一套与RL相关的变异操作器操作器,并使用简单的超导法来生成RL的测试案例。这使我们能够比较基于现有方法的不同变异杀害定义,并分析获得的变异操作器及其潜在组合的行为,称为更高调(HM) (HM) 。我们表明,变异操作器的设计选择中的变异杀人定义可以影响相对而言的测试案例以及我们所测测的越轨能力。