Robust model predictive control (MPC) is a well-known control technique for model-based control with constraints and uncertainties. In classic robust tube-based MPC approaches, an open-loop control sequence is computed via periodically solving an online nominal MPC problem, which requires prior model information and frequent access to onboard computational resources. In this paper, we propose an efficient robust MPC solution based on receding horizon reinforcement learning, called r-LPC, for unknown nonlinear systems with state constraints and disturbances. The proposed r-LPC utilizes a Koopman operator-based prediction model obtained off-line from pre-collected input-output datasets. Unlike classic tube-based MPC, in each prediction time interval of r-LPC, we use an actor-critic structure to learn a near-optimal feedback control policy rather than a control sequence. The resulting closed-loop control policy can be learned off-line and deployed online or learned online in an asynchronous way. In the latter case, online learning can be activated whenever necessary; for instance, the safety constraint is violated with the deployed policy. The closed-loop recursive feasibility, robustness, and asymptotic stability are proven under function approximation errors of the actor-critic networks. Simulation and experimental results on two nonlinear systems with unknown dynamics and disturbances have demonstrated that our approach has better or comparable performance when compared with tube-based MPC and LQR, and outperforms a recently developed actor-critic learning approach.
翻译:强势模型预测控制(MPC)是一种众所周知的以模型为基础的控制技术(MPC ) 。 在典型的以管为基础的软管型多功能计算器方法中,一个开放环控制序列是通过定期解决在线名义多功能计算器问题来计算的,这需要事先提供模型信息,并经常访问船上的计算资源。在本文中,我们提出了一个高效的稳健的MPC解决方案,其基础是放弃地平线强化学习,称为 r-LPC, 用于有国家制约和干扰的未知的非线性系统。拟议的 r-LPC 使用一个基于库普曼操作员的预测模型,该模型来自预先收集的投入输出数据集。与传统的基于管道的控制序列不同,在 R-LPC 的每个预测时间间隔中,我们使用一个早期模型模型控制程序来学习接近最佳的反馈控制政策,而不是一个控制序列。由此产生的闭路控制政策可以在离线上学习,或者以不连续的方式在网上进行。在后一种情况下,必要时可以激活在线学习;例如,安全约束与基于可比较性运行的系统,最近显示的稳定性,在不稳性、可比较性、可操作性操作性功能下,在不固定的系统下,在不固定的运行中进行。