Deep reinforcement learning is an increasingly popular technique for synthesising policies to control an agent's interaction with its environment. There is also growing interest in formally verifying that such policies are correct and execute safely. Progress has been made in this area by building on existing work for verification of deep neural networks and of continuous-state dynamical systems. In this paper, we tackle the problem of verifying probabilistic policies for deep reinforcement learning, which are used to, for example, tackle adversarial environments, break symmetries and manage trade-offs. We propose an abstraction approach, based on interval Markov decision processes, that yields probabilistic guarantees on a policy's execution, and present techniques to build and solve these models using abstract interpretation, mixed-integer linear programming, entropy-based refinement and probabilistic model checking. We implement our approach and illustrate its effectiveness on a selection of reinforcement learning benchmarks.
翻译:深层强化学习是一种越来越受欢迎的综合政策技术,用以控制代理人与其环境的相互作用; 也越来越有兴趣正式核实这些政策是否正确并安全执行; 在这一领域已取得进展,以现有工作为基础,核查深神经网络和连续状态动态系统; 在本文件中,我们处理核查深强化学习的概率政策的问题,例如用于处理对抗环境、打破对称并管理权衡; 我们根据马科夫的间隔决定程序,提出一种抽象方法,为政策的执行提供概率保障,并提出利用抽象解释、混合整数线性编程、精细化和概率模型检查来建立和解决这些模型的技术; 我们采用我们的方法,并展示其在选择强化学习基准方面的有效性。