通过加强学习优化预测控制模型 (Optimization of the Model Predictive Control Meta-Parameters Through Reinforcement Learning)

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Model predictive control (MPC) is increasingly being considered for control of fast systems and embedded applications. However, the MPC has some significant challenges for such systems. Its high computational complexity results in high power consumption from the control algorithm, which could account for a significant share of the energy resources in battery-powered embedded systems. The MPC parameters must be tuned, which is largely a trial-and-error process that affects the control performance, the robustness and the computational complexity of the controller to a high degree. In this paper, we propose a novel framework in which any parameter of the control algorithm can be jointly tuned using reinforcement learning(RL), with the goal of simultaneously optimizing the control performance and the power usage of the control algorithm. We propose the novel idea of optimizing the meta-parameters of MPCwith RL, i.e. parameters affecting the structure of the MPCproblem as opposed to the solution to a given problem. Our control algorithm is based on an event-triggered MPC where we learn when the MPC should be re-computed, and a dual mode MPC and linear state feedback control law applied in between MPC computations. We formulate a novel mixture-distribution policy and show that with joint optimization we achieve improvements that do not present themselves when optimizing the same parameters in isolation. We demonstrate our framework on the inverted pendulum control task, reducing the total computation time of the control system by 36% while also improving the control performance by 18.4% over the best-performing MPC baseline.

翻译：模型预测控制(MPC)正越来越多地被考虑用于控制快速系统和嵌入应用程序。然而,MPC对此类系统有一些重大挑战。其高计算复杂性导致控制算法的高能量消耗,这可以占电池动力嵌入系统中能源资源的很大一部分。MCC参数必须调整,这在很大程度上是一个试验和加速过程,对控制器的控制性能、稳健性和计算复杂性影响很大。在本文中,我们提出了一个新的框架,在这个框架内,控制算法的任何参数都可以使用强化学习(RL)来联合调整,目标是同时优化控制算法的控制性能和电动算法的使用。我们提出了与RL一起优化MPC的元参数,即影响MPCProblem结构的参数,而不是对特定问题的解决方案。我们的控制算法基于一个事件触发的MPC,我们从中了解到,控制算法的任何参数都可以使用强化学习(RL)来同时优化控制性能和控控法的电力。我们用MPC的双模式,在MPC内部的精确度和线性能控制范围内,我们用MC进行最佳的计算时,我们在MC进行最佳的计算时,我们目前最佳的计算时,我们用最精确的计算时,我们用MC的系统进行最佳的逻辑控制框架将改进了最佳的逻辑控制。