Inspired by the fact that humans use diverse sensory organs to perceive the world, sensors with different modalities are deployed in end-to-end driving to obtain the global context of the 3D scene. In previous works, camera and LiDAR inputs are fused through transformers for better driving performance. These inputs are normally further interpreted as high-level map information to assist navigation tasks. Nevertheless, extracting useful information from the complex map input is challenging, for redundant information may mislead the agent and negatively affect driving performance. We propose a novel approach to efficiently extract features from vectorized High-Definition (HD) maps and utilize them in the end-to-end driving tasks. In addition, we design a new expert to further enhance the model performance by considering multi-road rules. Experimental results prove that both of the proposed improvements enable our agent to achieve superior performance compared with other methods.
翻译:受人类使用不同的感官器官来了解世界这一事实的启发,在端到端驱动中部署了不同模式的传感器,以获得三维场景的全球背景。在以前的作品中,摄像头和激光雷达输入通过变压器结合,以便更好的驾驶性能。这些输入通常被进一步解释为有助于导航任务的高级别地图信息。然而,从复杂的地图输入中提取有用的信息是具有挑战性的,因为多余的信息可能会误导代理人,对驾驶性能产生不利影响。我们提出了一种新颖的方法,以便有效地从矢量式高定义(HD)地图中提取特征,并在终端到端驱动任务中使用这些特征。此外,我们设计了一名新的专家,通过考虑多条路规则来进一步提高模型性能。实验结果证明,这两项拟议的改进都使我们的代理人能够取得优于其他方法的性能。