This paper presents a policy parameterization for learning-based control on nonlinear, partially-observed dynamical systems. The parameterization is based on a nonlinear version of the Youla parameterization and the recently proposed Recurrent Equilibrium Network (REN) class of models. We prove that the resulting Youla-REN parameterization automatically satisfies stability (contraction) and user-tunable robustness (Lipschitz) conditions on the closed-loop system. This means it can be used for safe learning-based control with no additional constraints or projections required to enforce stability or robustness. We test the new policy class in simulation on two reinforcement learning tasks: 1) magnetic suspension, and 2) inverting a rotary-arm pendulum. We find that the Youla-REN performs similarly to existing learning-based and optimal control methods while also ensuring stability and exhibiting improved robustness to adversarial disturbances.
翻译:本文提出了一种基于非线性Youla参数化和最近提出的循环均衡网络(REN)模型类的学习控制策略参数化。我们证明了得到的Youla-REN参数化自动满足闭环系统的稳定性(收缩)和用户可调节的鲁棒性(Lipschitz)条件。这意味着可以在学习中实现安全的学习控制,无需额外的约束或投影来强制实现稳定性或鲁棒性。我们在两个强化学习任务中测试了新的策略类:1)磁悬浮和2)反转旋转臂摆。我们发现,Youla-REN性能与现有的学习控制和最优控制方法类似,同时确保稳定性,并表现出对敌对干扰的改进鲁棒性。