Planning an optimal route in a complex environment requires efficient reasoning about the surrounding scene. While human drivers prioritize important objects and ignore details not relevant to the decision, learning-based planners typically extract features from dense, high-dimensional grid representations containing all vehicle and road context information. In this paper, we propose PlanT, a novel approach for planning in the context of self-driving that uses a standard transformer architecture. PlanT is based on imitation learning with a compact object-level input representation. On the Longest6 benchmark for CARLA, PlanT outperforms all prior methods (matching the driving score of the expert) while being 5.3x faster than equivalent pixel-based planning baselines during inference. Combining PlanT with an off-the-shelf perception module provides a sensor-based driving system that is more than 10 points better in terms of driving score than the existing state of the art. Furthermore, we propose an evaluation protocol to quantify the ability of planners to identify relevant objects, providing insights regarding their decision-making. Our results indicate that PlanT can focus on the most relevant object in the scene, even when this object is geometrically distant.
翻译:在复杂的环境中规划一条最佳路线需要对周围环境进行有效的推理。虽然人类驾驶员优先考虑重要物体,忽视与决策无关的细节,但学习型规划员通常会从包含所有车辆和道路背景信息的密集、高维电网图中提取含有所有车辆和道路背景信息的特征。在本文件中,我们提议PlanT,这是在使用标准变压器结构的自我驾驶背景下进行规划的一种新颖办法。PlanT基于模拟学习,并采用紧凑的物体级输入代表制。在CARLA的Long6基准中,PlanT优于所有先前的方法(匹配专家的驾驶分数),而在推断期间比等同的像素基规划基线快5.3x。将PlanT与现成的感知模块结合起来,提供了一种在驾驶分数方面比目前艺术状态好10个百分点以上的传感器驱动系统。此外,我们提议一项评价程序,以量化规划员识别相关物体的能力,提供有关决策的洞察力。我们的结果表明,PlanT可以把重点放在现场最相关的物体上,即使这个物体是远方位遥远的物体。