Learning a dynamical system requires stabilizing the unknown dynamics to avoid state blow-ups. However, current reinforcement learning (RL) methods lack stabilization guarantees, which limits their applicability for the control of safety-critical systems. We propose a model-based RL framework with formal stability guarantees, Krasovskii Constrained RL (KCRL), that adopts Krasovskii's family of Lyapunov functions as a stability constraint. The proposed method learns the system dynamics up to a confidence interval using feature representation, e.g. Random Fourier Features. It then solves a constrained policy optimization problem with a stability constraint based on Krasovskii's method using a primal-dual approach to recover a stabilizing policy. We show that KCRL is guaranteed to learn a stabilizing policy in a finite number of interactions with the underlying unknown system. We also derive the sample complexity upper bound for stabilization of unknown nonlinear dynamical systems via the KCRL framework.
翻译:学习动态系统需要稳定未知的动态动态,以避免国家爆破。 但是,当前的强化学习方法缺乏稳定保障,限制了其安全临界系统控制的适用性。 我们提出一个基于模型的RL框架,并有正式的稳定保障, Krasovskii Constraced RL(KCRL), 将Krasovskii的Lyapunov家族功能当作稳定性制约。 拟议的方法利用特征代表( 如随机 Fourier 功能) 来学习系统动态, 直至信任间隔。 然后, 以Krasovskii 的方法为基础, 解决一个受限的政策优化问题, 并基于Krasovskii 方法, 使用原始双元方法恢复稳定政策。 我们证明, KCRL 保证在一定数量的与基本未知系统互动中学习稳定政策。 我们还通过 KCRL 框架获取未知的非线性动态系统稳定性系统样本的复杂性上限 。