We propose a Model Predictive Control (MPC) method for collision-free navigation that uses amortized variational inference to approximate the distribution of optimal control sequences by training a normalizing flow conditioned on the start, goal and environment. This representation allows us to learn a distribution that accounts for both the dynamics of the robot and complex obstacle geometries. We can then sample from this distribution to produce control sequences which are likely to be both goal-directed and collision-free as part of our proposed FlowMPPI sampling-based MPC method. However, when deploying this method, the robot may encounter an out-of-distribution (OOD) environment, i.e. one which is radically different from those used in training. In such cases, the learned flow cannot be trusted to produce low-cost control sequences. To generalize our method to OOD environments we also present an approach that performs projection on the representation of the environment as part of the MPC process. This projection changes the environment representation to be more in-distribution while also optimizing trajectory quality in the true environment. Our simulation results on a 2D double-integrator and a 3D 12DoF underactuated quadrotor suggest that FlowMPPI with projection outperforms state-of-the-art MPC baselines on both in-distribution and OOD environments, including OOD environments generated from real-world data.
翻译:我们提出一个无碰撞导航的模型预测控制(MPC)方法,该方法使用分解变异推断法,通过训练在起始、目标和环境条件下正常流流,来估计最佳控制序列的分布。这个表示让我们能够学习出一个既考虑到机器人动态又考虑到复杂障碍地形的分布方法。然后我们可以从这种分布中抽样来产生控制序列,这些序列有可能既以目标为导向,又不发生碰撞,作为我们拟议的以流动MPPI取样为基础的移动控制计算法的一部分。然而,在采用这种方法时,机器人可能会遇到一种分配外环境,即与培训中使用的环境截然不同。在这种情况下,所学的流不能被信任于产生低成本控制序列。为了将我们的方法推广到 OODD环境,我们提出的一种办法是预测环境在MPC进程的一部分中的代表性。这种预测会改变环境的分布范围,同时优化真实环境中的轨迹质量。我们在2D双向分布环境的模拟结果,包括OODO-DF模型在12号轨道模型的模型下,在2D-双向流流流流化环境的预测中,在OD-D-D-D-D-D-PADMADMDMD在12号的基线下,在12号模型的模拟环境中的模拟环境中的模拟结果。