We propose new methods for learning control policies and neural network Lyapunov functions for nonlinear control problems, with provable guarantee of stability. The framework consists of a learner that attempts to find the control and Lyapunov functions, and a falsifier that finds counterexamples to quickly guide the learner towards solutions. The procedure terminates when no counterexample is found by the falsifier, in which case the controlled nonlinear system is provably stable. The approach significantly simplifies the process of Lyapunov control design, provides end-to-end correctness guarantee, and can obtain much larger regions of attraction than existing methods such as LQR and SOS/SDP. We show experiments on how the new methods obtain high-quality solutions for challenging control problems.
翻译:我们为非线性控制问题提出了学习控制政策和神经网络Lyapunov功能的新方法,并有可证实的稳定保证。框架包括一名试图找到控制和Lyapunov功能的学习者,以及一个找到反示例以迅速引导学习者找到解决办法的假体。当假冒者找不到反示例时,程序就会终止,在这种情况下,受控制的非线性系统是稳定的。这种方法大大简化了Lyapunov控制设计过程,提供了端到端的正确性保证,并能够获得比LQR和SOS/SDP等现有方法更大的吸引力区域。我们展示了关于新方法如何为挑战控制问题获得高质量解决方案的实验。