Neural network controllers have become popular in control tasks thanks to their flexibility and expressivity. Stability is a crucial property for safety-critical dynamical systems, while stabilization of partially observed systems, in many cases, requires controllers to retain and process long-term memories of the past. We consider the important class of recurrent neural networks (RNN) as dynamic controllers for nonlinear uncertain partially-observed systems, and derive convex stability conditions based on integral quadratic constraints, S-lemma and sequential convexification. To ensure stability during the learning and control process, we propose a projected policy gradient method that iteratively enforces the stability conditions in the reparametrized space taking advantage of mild additional information on system dynamics. Numerical experiments show that our method learns stabilizing controllers while using fewer samples and achieving higher final performance compared with policy gradient.
翻译:神经网络控制器由于其灵活性和直观性,在控制任务中已变得很受欢迎。稳定是安全临界动态系统的关键属性,而部分观测到的系统的稳定在许多情况下需要控制器保存和处理过去的长期记忆。我们认为,重要的一类经常性神经网络(RNN)是非线性不确定部分观测到的系统动态控制器,根据整体四级限制、S-lemma和顺序凝固,产生锥形稳定条件。为了确保学习和控制过程的稳定,我们建议了一种预测的政策梯度方法,利用系统动态的微小的额外信息,迭接地执行重新修复空间的稳定条件。数字实验表明,我们的方法在使用较少的样本的同时学习稳定控制器,并实现比政策梯度更高的最终性能。