We present differentiable predictive control (DPC), a method for learning constrained neural control policies for linear systems with probabilistic performance guarantees. We employ automatic differentiation to obtain direct policy gradients by backpropagating the model predictive control (MPC) loss function and constraints penalties through a differentiable closed-loop system dynamics model. We demonstrate that the proposed method can learn parametric constrained control policies to stabilize systems with unstable dynamics, track time-varying references, and satisfy nonlinear state and input constraints. In contrast with imitation learning-based approaches, our method does not depend on a supervisory controller. Most importantly, we demonstrate that, without losing performance, our method is scalable and computationally more efficient than implicit, explicit, and approximate MPC. Under review at IEEE Transactions on Automatic Control.
翻译:我们提出了不同的预测控制(DPC)方法,这是学习具有概率性性能保证的线性系统受约束神经控制政策的一种方法;我们采用自动区分法,通过反射模型预测控制(MPC)损失功能和限制处罚,通过不同的闭环系统动态模型,获取直接的政策梯度;我们证明,拟议方法可以学习参数限制控制政策,以稳定具有不稳定动态的系统,跟踪时间变化参照,并满足非线性状态和输入限制;与模仿基于学习的方法不同,我们的方法并不依赖于监督控制者。最重要的是,我们证明,在不丧失性能的情况下,我们的方法比隐含的、明确的和接近的MPC更有效。在IEEE自动控制交易中审查。