We generalize the derivation of model predictive path integral control (MPPI) to allow for a single joint distribution across controls in the control sequence. This reformation allows for the implementation of adaptive importance sampling (AIS) algorithms into the original importance sampling step while still maintaining the benefits of MPPI such as working with arbitrary system dynamics and cost functions. The benefit of optimizing the proposal distribution by integrating AIS at each control step is demonstrated in simulated environments including controlling multiple cars around a track. The new algorithm is more sample efficient than MPPI, achieving better performance with fewer samples. This performance disparity grows as the dimension of the action space increases. Results from simulations suggest the new algorithm can be used as an anytime algorithm, increasing the value of control at each iteration versus relying on a large set of samples.
翻译:我们将模型预测路径整体控制(MPPI)的衍生法推广到模型预测路径整体控制(MPPI)中,以便能够在控制序列的多个控制器中进行单一的联合分布。这种重新配置允许将适应重要性抽样算法(AIS)应用到最初的重要取样步骤,同时仍然保持移动电话价格指数的好处,例如使用任意的系统动态和成本功能。通过在每个控制步骤整合AIS来优化建议分布的好处在模拟环境中得到证明,包括控制轨道周围的多辆汽车。新的算法比MPPI更有效率,以较少的样本实现更好的性能。随着行动空间的扩大,这种性能差异会扩大。模拟的结果表明,新的算法可以作为一种随时算法使用,提高每个迭代的控制权价值,而依靠大量样本。</s>