Evaluation of deep reinforcement learning (RL) is inherently challenging. Especially the opaqueness of learned policies and the stochastic nature of both agents and environments make testing the behavior of deep RL agents difficult. We present a search-based testing framework that enables a wide range of novel analysis capabilities for evaluating the safety and performance of deep RL agents. For safety testing, our framework utilizes a search algorithm that searches for a reference trace that solves the RL task. The backtracking states of the search, called boundary states, pose safety-critical situations. We create safety test-suites that evaluate how well the RL agent escapes safety-critical situations near these boundary states. For robust performance testing, we create a diverse set of traces via fuzz testing. These fuzz traces are used to bring the agent into a wide variety of potentially unknown states from which the average performance of the agent is compared to the average performance of the fuzz traces. We apply our search-based testing approach on RL for Nintendo's Super Mario Bros.
翻译:深层强化学习(RL)的评估具有内在的挑战性。 特别是,所学政策的不透明性以及代理物和环境的随机性质使得测试深层代理物的行为变得困难。 我们提出了一个基于搜索的测试框架, 使得能够对深层RL代理物的安全和性能进行广泛的新颖分析。 在安全测试方面, 我们的框架使用搜索算法, 搜索参考线索, 解决了 RL 的任务。 搜索的回溯跟踪状态, 被称为边界状态, 构成了安全危急情况。 我们创建了安全测试工具, 评估RL 代理物在这些边界州附近安全危急情况下的越轨情况有多好。 为了进行严格的性能测试, 我们通过模糊测试, 创建了一套多样的痕迹。 这些模糊痕迹被用来将代理物的平均性能与模糊痕迹的平均性能进行比较。 我们用基于搜索的RL方法, 用于Nintendo's Superrio Mario Bros。