用于模拟预测控制的学习抽样分发情况 (Learning Sampling Distributions for Model Predictive Control)

from arxiv, Accepted at the Conference on Robot Learning (CoRL), 2022. Main paper is 9 pages with 4 figures. Appendix is 12 pages with 11 figures and 1 table

Sampling-based methods have become a cornerstone of contemporary approaches to Model Predictive Control (MPC), as they make no restrictions on the differentiability of the dynamics or cost function and are straightforward to parallelize. However, their efficacy is highly dependent on the quality of the sampling distribution itself, which is often assumed to be simple, like a Gaussian. This restriction can result in samples which are far from optimal, leading to poor performance. Recent work has explored improving the performance of MPC by sampling in a learned latent space of controls. However, these methods ultimately perform all MPC parameter updates and warm-starting between time steps in the control space. This requires us to rely on a number of heuristics for generating samples and updating the distribution and may lead to sub-optimal performance. Instead, we propose to carry out all operations in the latent space, allowing us to take full advantage of the learned distribution. Specifically, we frame the learning problem as bi-level optimization and show how to train the controller with backpropagation-through-time. By using a normalizing flow parameterization of the distribution, we can leverage its tractable density to avoid requiring differentiability of the dynamics and cost function. Finally, we evaluate the proposed approach on simulated robotics tasks and demonstrate its ability to surpass the performance of prior methods and scale better with a reduced number of samples.

翻译：以抽样为基础的方法已成为现代模型预测控制方法的基石,因为这些方法对动态或成本功能的不同性没有限制,而且直截了当地加以平行;然而,其效力在很大程度上取决于抽样分布本身的质量,而这种质量通常被假定为简单,像高斯人一样。这种限制可能导致样品远非最佳,导致性能不佳。最近的工作探索了通过在已知的潜在控制空间中取样来改进模型控制控制中心的工作表现。然而,这些方法最终进行所有MPC参数更新和在控制空间的时间步骤之间启动暖化。这要求我们依靠一些超常的样本生成和更新分布,并可能导致亚优性性性性性能。相反,我们提议在潜性空间开展所有活动,使我们能够充分利用所学到的分布。具体地说,我们把学习问题标定为双级优化,并表明如何用回向式调整时间来训练控制器。通过正常的流量参数化,我们可以利用其可移动性能的精度,从而利用先前可移动性密度和超性性性能,从而证明我们提出的前可移动性能的精确性能。