The combination of learning methods with Model Predictive Control (MPC) has attracted a significant amount of attention in the recent literature. The hope of this combination is to reduce the reliance of MPC schemes on accurate models, and to tap into the fast developing machine learning and reinforcement learning tools to exploit the growing amount of data available for many systems. In particular, the combination of reinforcement learning and MPC has been proposed as a viable and theoretically justified approach to introduce explainable, safe and stable policies in reinforcement learning. However, a formal theory detailing how the safety and stability of an MPC-based policy can be maintained through the parameter updates delivered by the learning tools is still lacking. This paper addresses this gap. The theory is developed for the generic Robust MPC case, and applied in simulation in the robust tube-based linear MPC case, where the theory is fairly easy to deploy in practice. The paper focuses on Reinforcement Learning as a learning tool, but it applies to any learning method that updates the MPC parameters online.
翻译:在最近的文献中,学习方法与模型预测控制(MPC)的结合引起了大量关注。这种结合的希望是减少MPC计划对准确模型的依赖,利用快速开发的机器学习和强化学习工具,利用许多系统可获得的日益增多的数据;特别是,提议将强化学习与组合控制相结合,作为一种可行和理论上合理的方法,在强化学习中引入可解释、安全和稳定的政策。然而,目前仍然缺乏正式的理论,详细说明如何通过学习工具提供的参数更新来维持基于组合控制政策的安全性和稳定性。本文弥补了这一差距。理论是为通用的Robust MPC案例开发的,并用于模拟基于管道的强大线性MPC案例,该案例的理论在实践上相当容易运用。论文侧重于强化学习,将其作为一种学习工具,但适用于在网上更新组合控制参数的任何学习方法。