Deep reinforcement learning (DRL) has become a dominant deep-learning paradigm for various tasks in which complex policies are learned within reactive systems. In parallel, there has recently been significant research on verifying deep neural networks. However, to date, there has been little work demonstrating the use of modern verification tools on real, DRL-controlled systems. In this case-study paper, we attempt to begin bridging this gap, and focus on the important task of mapless robotic navigation -- a classic robotics problem, in which a robot, usually controlled by a DRL agent, needs to efficiently and safely navigate through an unknown arena towards a desired target. We demonstrate how modern verification engines can be used for effective model selection, i.e., the process of selecting the best available policy for the robot in question from a pool of candidate policies. Specifically, we use verification to detect and rule out policies that may demonstrate suboptimal behavior, such as collisions and infinite loops. We also apply verification to identify models with overly conservative behavior, thus allowing users to choose superior policies that are better at finding an optimal, shorter path to a target. To validate our work, we conducted extensive experiments on an actual robot, and confirmed that the suboptimal policies detected by our method were indeed flawed. We also compared our verification-driven approach to state-of-the-art gradient attacks, and our results demonstrate that gradient-based methods are inadequate in this setting. Our work is the first to demonstrate the use of DNN verification backends for recognizing suboptimal DRL policies in real-world robots, and for filtering out unwanted policies. We believe that the methods presented in this work can be applied to a large range of application domains that incorporate deep-learning-based agents.
翻译:深加强化学习( DRL) 已经成为各种任务的主要深层学习模式,在这些任务中,应对反应系统学习复杂的政策。与此同时,最近对深心神经网络进行了重大研究。然而,迄今为止,在实际的DRL控制系统中,几乎没有什么工作表明使用现代核查工具。在本案例研究文件中,我们试图缩小这一差距,并侧重于无地图机器人导航这一重要任务 -- -- 一个典型的机器人问题,在这个问题上,一个机器人,通常由路由路由路由路由者控制的机器人,需要高效率、安全地在未知的舞台中度过一个理想的目标。我们展示了现代核查引擎如何能够用于有效的模型选择,即,从候选政策库中选择有关机器人的最佳现有政策。具体地说,我们利用核查工具来检测和排除可能显示不完美的行为的政策,例如碰撞和无限循环。我们还应用核查来鉴别基于过于保守的行为模式,从而让用户选择更好的政策,从而更好地找到最佳、更短的路径到一个目标。为了验证我们的工作,我们还进行了广泛的测试,在实际的机器人核查方法中,我们演示了一种真实的机器人操作方法,我们演示了我们所演示了一种机械化的方法。