This article proposes a hierarchical learning architecture for safe data-driven control in unknown environments. We consider a constrained nonlinear dynamical system and assume the availability of state-input trajectories solving control tasks in different environments. In addition to task-invariant system state and input constraints, a parameterized environment model generates task-specific state constraints, which are satisfied by the stored trajectories. Our goal is to use these trajectories to find a safe and high-performing policy for a new task in a new, unknown environment. We propose using the stored data to learn generalizable control strategies. At each time step, based on a local forecast of the new task environment, the learned strategy consists of a target region in the state space and input constraints to guide the system evolution to the target region. These target regions are used as terminal sets by a low-level model predictive controller. We show how to i) design the target sets from past data and then ii) incorporate them into a model predictive control scheme with shifting horizon that ensures safety of the closed-loop system when performing the new task. We prove the feasibility of the resulting control policy, and apply the proposed method to robotic path planning, racing, and computer game applications.
翻译:本条提出在未知环境中安全数据驱动的控制的等级学习架构。 我们考虑一个限制的非线性动态系统, 并假设有国家输入轨迹可以在不同环境中解决控制任务。 除了任务差异系统状态和输入限制之外, 一个参数环境模型还产生任务特定状态限制, 被存储的轨迹所满足。 我们的目标是利用这些轨迹为一个新的、 未知环境中的新任务寻找安全和高绩效的政策。 我们提议使用存储的数据学习通用的控制战略。 在每个时间步骤中, 根据对新任务环境的本地预测, 学习的战略包括州空间目标区域和指导系统向目标区域演变的输入限制。 这些目标区域被一个低级别模型预测控制器用作终端组。 我们展示如何从过去的数据中设计目标组, 然后将它们纳入一个具有变化视野的模型预测控制计划, 以确保闭路控制系统在执行新任务时的安全。 我们证明, 由此而形成的游戏控制策略的可行性, 并应用所拟议的方法。