Sampling-based Model Predictive Control (MPC) is a flexible control framework that can reason about non-smooth dynamics and cost functions. Recently, significant work has focused on the use of machine learning to improve the performance of MPC, often through learning or fine-tuning the dynamics or cost function. In contrast, we focus on learning to optimize more effectively. In other words, to improve the update rule within MPC. We show that this can be particularly useful in sampling-based MPC, where we often wish to minimize the number of samples for computational reasons. Unfortunately, the cost of computational efficiency is a reduction in performance; fewer samples results in noisier updates. We show that we can contend with this noise by learning how to update the control distribution more effectively and make better use of the few samples that we have. Our learned controllers are trained via imitation learning to mimic an expert which has access to substantially more samples. We test the efficacy of our approach on multiple simulated robotics tasks in sample-constrained regimes and demonstrate that our approach can outperform a MPC controller with the same number of samples.
翻译:以抽样为基础的模型预测控制(MPC)是一个灵活的控制框架,可以说明非光滑动态和成本功能的理由。最近,大量工作的重点是利用机器学习来改善多氯三联苯的性能,通常是通过学习或微调动态或成本功能。相反,我们注重学习以更有效地优化。换句话说,改进多氯三联苯内部的更新规则。我们表明,这在基于取样的多氯三联苯中特别有用,因为我们常常希望为计算原因而尽量减少样本的数量。不幸的是,计算效率的成本是性能下降;更新新版本的样本结果较少。我们表明,我们可以通过学习如何更有效地更新控制分布和更好地利用我们拥有的少数样本来对付这种噪音。我们学到的控制者通过模仿学习如何模仿能够接触到能够获取更多样本的专家。我们测试了我们在采样制度下多种模拟机器人任务的方法的效力,并证明我们的方法可以比多样本数量的多氯三联控制器控制器。