空间检索增强的自动驾驶 (Spatial Retrieval Augmented Autonomous Driving)

Existing autonomous driving systems rely on onboard sensors (cameras, LiDAR, IMU, etc) for environmental perception. However, this paradigm is limited by the drive-time perception horizon and often fails under limited view scope, occlusion or extreme conditions such as darkness and rain. In contrast, human drivers are able to recall road structure even under poor visibility. To endow models with this ``recall" ability, we propose the spatial retrieval paradigm, introducing offline retrieved geographic images as an additional input. These images are easy to obtain from offline caches (e.g, Google Maps or stored autonomous driving datasets) without requiring additional sensors, making it a plug-and-play extension for existing AD tasks. For experiments, we first extend the nuScenes dataset with geographic images retrieved via Google Maps APIs and align the new data with ego-vehicle trajectories. We establish baselines across five core autonomous driving tasks: object detection, online mapping, occupancy prediction, end-to-end planning, and generative world modeling. Extensive experiments show that the extended modality could enhance the performance of certain tasks. We will open-source dataset curation code, data, and benchmarks for further study of this new autonomous driving paradigm.

翻译：现有的自动驾驶系统依赖车载传感器（摄像头、激光雷达、惯性测量单元等）进行环境感知。然而，该范式受限于行驶时的感知范围，且在视野受限、遮挡或黑暗、降雨等极端条件下常出现失效。相比之下，人类驾驶员即使在能见度较差时也能回忆起道路结构。为赋予模型这种“回忆”能力，我们提出了空间检索范式，引入离线检索的地理图像作为额外输入。这些图像易于从离线缓存（例如谷歌地图或存储的自动驾驶数据集）中获取，无需额外传感器，使其成为现有自动驾驶任务的即插即用扩展。在实验中，我们首先通过谷歌地图API检索地理图像来扩展nuScenes数据集，并将新数据与自车轨迹对齐。我们在五个核心自动驾驶任务上建立了基线：目标检测、在线建图、占用预测、端到端规划以及生成式世界建模。大量实验表明，扩展的模态能够提升特定任务的性能。我们将开源数据集构建代码、数据及基准测试，以促进这一新型自动驾驶范式的进一步研究。