Multi-agent reinforcement learning (RL) often struggles to ensure the safe behaviours of the learning agents, and therefore it is generally not adapted to safety-critical applications. To address this issue, we present a methodology that combines formal verification with (deep) RL algorithms to guarantee the satisfaction of formally-specified safety constraints both in training and testing. The approach we propose expresses the constraints to verify in Probabilistic Computation Tree Logic (PCTL) and builds an abstract representation of the system to reduce the complexity of the verification step. This abstract model allows for model checking techniques to identify a set of abstract policies that meet the safety constraints expressed in PCTL. Then, the agents' behaviours are restricted according to these safe abstract policies. We provide formal guarantees that by using this method, the actions of the agents always meet the safety constraints, and provide a procedure to generate an abstract model automatically. We empirically evaluate and show the effectiveness of our method in a multi-agent environment.
翻译:多试剂强化学习(RL)往往难以确保学习代理人的安全行为,因此通常不适应安全关键应用。为了解决这一问题,我们提出了一个方法,将正式核查与(深度)RL算法结合起来,以保证在培训和测试中满足正式指定的安全限制。我们提议的方法表示在概率计算树逻辑(PCTL)中核查的制约因素,并建立一个系统的抽象代表,以减少核查步骤的复杂性。这一抽象模型允许示范检查技术确定一套满足PCTL所述安全限制的抽象政策。然后,根据这些安全的抽象政策限制代理人的行为。我们提供正式保证,通过使用这种方法,代理人的行动总是符合安全限制,并提供自动生成抽象模型的程序。我们从经验上评估和展示了我们在多试环境中的方法的有效性。