Deep learning has had a far reaching impact in robotics. Specifically, deep reinforcement learning algorithms have been highly effective in synthesizing neural-network controllers for a wide range of tasks. However, despite this empirical success, these controllers still lack theoretical guarantees on their performance, such as Lyapunov stability (i.e., all trajectories of the closed-loop system are guaranteed to converge to a goal state under the control policy). This is in stark contrast to traditional model-based controller design, where principled approaches (like LQR) can synthesize stable controllers with provable guarantees. To address this gap, we propose a generic method to synthesize a Lyapunov-stable neural-network controller, together with a neural-network Lyapunov function to simultaneously certify its stability. Our approach formulates the Lyapunov condition verification as a mixed-integer linear program (MIP). Our MIP verifier either certifies the Lyapunov condition, or generates counter examples that can help improve the candidate controller and the Lyapunov function. We also present an optimization program to compute an inner approximation of the region of attraction for the closed-loop system. We apply our approach to robots including an inverted pendulum, a 2D and a 3D quadrotor, and showcase that our neural-network controller outperforms a baseline LQR controller. The code is open sourced at \url{https://github.com/StanfordASL/neural-network-lyapunov}.
翻译:深层学习在机器人中产生了深远的影响。 具体地说, 深层强化学习算法在将神经网络控制器合成成神经网络控制器以完成一系列广泛任务方面非常有效。 然而,尽管取得了这一成功经验, 这些控制器仍然缺乏对其性能的理论保证, 例如 Lyapunov 稳定性( 即, 封闭循环系统的所有轨迹都得到保证, 以结合到控制政策下的目标状态 ) 。 这与传统的基于模型的控制器设计形成鲜明的对比, 在这种设计中, 深层强化学习算法( 如 LQRR ) 可以以可变保证的方式合成稳定的控制器。 为了弥补这一差距, 我们提出了一种通用方法, 合成一个 Lyapunov- sable 神经网络控制器, 以及一个同时验证其稳定性的神经网络网络网络。 我们的方法将 Lyapunov 状态校验作为混合 Interger 线性程序( MIP ) 。 我们的 MIP 校验要么 校验了 Lyapunov 状态, 或产生反源示例可以帮助改进候选人控制器和 Lyapunov 功能功能功能功能的功能。 我们还在Oral2 的OD 上, 我们的系统应用了一个最底的系统, 的系统, 的Slocol- cloveal 。