With a growing interest in data-driven control techniques, Model Predictive Control (MPC) provides an opportunity to exploit the surplus of data reliably, particularly while taking safety and stability into account. In many real-world and industrial applications, it is typical to have an existing control strategy, for instance, execution from a human operator. The objective of this work is to improve upon this unknown, safe but suboptimal policy by learning a new controller that retains safety and stability. Learning how to be safe is achieved directly from data and from a knowledge of the system constraints. The proposed algorithm alternatively learns the terminal cost and updates the MPC parameters according to a stability metric. The terminal cost is constructed as a Lyapunov function neural network with the aim of recovering or extending the stable region of the initial demonstrator using a short prediction horizon. Theorems that characterize the stability and performance of the learned MPC in the bearing of model uncertainties and sub-optimality due to function approximation are presented. The efficacy of the proposed algorithm is demonstrated on non-linear continuous control tasks with soft constraints. The proposed approach can improve upon the initial demonstrator also in practice and achieve better stability than popular reinforcement learning baselines.
翻译:由于对数据驱动的控制技术越来越感兴趣,模型预测控制(MPC)为可靠地利用数据剩余数据提供了机会,特别是在考虑到安全和稳定的情况下。在许多现实世界和工业应用中,典型的做法是有一套现有的控制战略,例如由人操作员执行。这项工作的目标是通过学习新的控制器来改进这一未知、安全但又不最优化的政策,以保持安全和稳定。如何安全是直接从数据和系统限制知识中得来的。提议的算法或者学习终端成本,并根据稳定度指标更新MPC参数。终端成本是作为Lyapunov功能神经网络建造的,目的是利用一个短的预测视野恢复或扩大最初的示范器的稳定区域。介绍了所学的MPC在承受模型不确定性和功能近似情况下的稳定性和性能特征。提议的算法的效力表现在非线性连续控制任务上,并且有软性制约。拟议的方法可以改进最初的模拟器功能神经网络,在实践中恢复或扩大最初的稳定性,比大众学习的稳定性还要好。