Stabilizing an unknown control system is one of the most fundamental problems in control systems engineering. In this paper, we provide a simple, model-free algorithm for stabilizing fully observed dynamical systems. While model-free methods have become increasingly popular in practice due to their simplicity and flexibility, stabilization via direct policy search has received surprisingly little attention. Our algorithm proceeds by solving a series of discounted LQR problems, where the discount factor is gradually increased. We prove that this method efficiently recovers a stabilizing controller for linear systems, and for smooth, nonlinear systems within a neighborhood of their equilibria. Our approach overcomes a significant limitation of prior work, namely the need for a pre-given stabilizing control policy. We empirically evaluate the effectiveness of our approach on common control benchmarks.
翻译:稳定一个未知的控制系统是控制系统工程的最根本问题之一。 在本文中,我们为稳定完全观测到的动态系统提供了一个简单、无模型的算法。虽然无模型的方法由于简单和灵活而在实践中越来越受欢迎,但通过直接政策搜索实现稳定却很少引起人们的注意。我们的算法通过解决一系列折扣LQR问题(折扣因子逐渐增加)而不断取得进展。我们证明,这种方法有效地恢复了线性系统的稳定控制器,以及在其平衡的附近地区平稳的非线性系统的稳定控制器。我们的方法克服了以前工作的重大限制,即需要预先确定稳定控制政策。我们用经验评估了我们共同控制基准方法的有效性。