以模型为基础的安全模式和元加强学习:基于抽象的方法 (Provably Safe Model-Based Meta Reinforcement Learning: An Abstraction-Based Approach)

While conventional reinforcement learning focuses on designing agents that can perform one task, meta-learning aims, instead, to solve the problem of designing agents that can generalize to different tasks (e.g., environments, obstacles, and goals) that were not considered during the design or the training of these agents. In this spirit, in this paper, we consider the problem of training a provably safe Neural Network (NN) controller for uncertain nonlinear dynamical systems that can generalize to new tasks that were not present in the training data while preserving strong safety guarantees. Our approach is to learn a set of NN controllers during the training phase. When the task becomes available at runtime, our framework will carefully select a subset of these NN controllers and compose them to form the final NN controller. Critical to our approach is the ability to compute a finite-state abstraction of the nonlinear dynamical system. This abstract model captures the behavior of the closed-loop system under all possible NN weights, and is used to train the NNs and compose them when the task becomes available. We provide theoretical guarantees that govern the correctness of the resulting NN. We evaluated our approach on the problem of controlling a wheeled robot in cluttered environments that were not present in the training data.

翻译：常规强化学习侧重于设计能够完成一项任务的代理物,而元学习则着眼于解决设计能够在设计或培训这些代理物的过程中未考虑的不同任务(例如环境、障碍和目标)的代理物的问题。本着这一精神,在本文件中,我们考虑的问题是,如何为不确定的非线性动态系统培训一个可变安全的神经网络控制器(NN)控制器,这种控制器可以概括在培训数据中不存在的新任务,同时保留强有力的安全保障。我们的做法是在培训阶段学习一套NNN控制器。当任务在运行时到位时,我们的框架将仔细选择这些NNN控制器的一组子,并组成最后NN控制器。对于我们的方法至关重要的是,能否将非线性动态系统计算成一个有限状态的抽象模型。这种抽象模型可以捕捉到所有可能的非线性动态系统在NNW重量下的行为,用来培训NNP控制器,并在任务到位时将其配置成。我们提供理论保证,用以控制最终的NNP环境的正确性方法。我们没有在机器人环境下对结果的机器人问题进行评估。