The problem of path planning has been studied for years. Classic planning pipelines, including perception, mapping, and path searching, can result in latency and compounding errors between modules. While recent studies have demonstrated the effectiveness of end-to-end learning methods in achieving high planning efficiency, these methods often struggle to match the generalization abilities of classic approaches in handling different environments. Moreover, end-to-end training of policies often requires a large number of labeled data or training iterations to reach convergence. In this paper, we present a novel Imperative Learning (IL) approach. This approach leverages a differentiable cost map to provide implicit supervision during policy training, eliminating the need for demonstrations or labeled trajectories. Furthermore, the policy training adopts a Bi-Level Optimization (BLO) process, which combines network update and metric-based trajectory optimization, to generate a smooth and collision-free path toward the goal based on a single depth measurement. The proposed method allows task-level costs of predicted trajectories to be backpropagated through all components to update the network through direct gradient descent. In our experiments, the method demonstrates around 4x faster planning than the classic approach and robustness against localization noise. Additionally, the IL approach enables the planner to generalize to various unseen environments, resulting in an overall 26-87% improvement in SPL performance compared to baseline learning methods.
翻译:路径规划问题已经研究多年了。经典规划管道,包括感知、绘图和路径搜索等典型规划管道,可能导致模块之间的长期和复杂错误。虽然最近的研究显示端到端学习方法在实现高规划效率方面的效力,但这些方法往往难以与处理不同环境的典型方法的概括性能力相匹配。此外,政策端到端培训往往需要大量的标签数据或培训迭代才能达到趋同。在本文件中,我们提出了一个全新的隐含性学习方法。这种方法利用了一种不同的成本图,在政策培训期间提供隐含的监督,消除了演示或标记轨迹的必要性。此外,政策培训采用了双级优化(BLO)进程,将网络更新和基于度轨迹优化结合起来,以产生一条通畅无碰撞的道路,在单一深度测量的基础上实现目标。拟议方法允许对预测的轨迹跟踪的成本进行反向所有组件,以便通过直接的梯度下降更新网络,从而消除演示或标记轨迹的轨迹。此外,政策培训采用了双级优化(BL)进程,将网络更新到常规学习方法,将四步法比常规级方法更快速地推进。</s>