In this paper, we investigate the application of Vehicle-to-Everything (V2X) communication to improve the perception performance of autonomous vehicles. We present a robust cooperative perception framework with V2X communication using a novel vision Transformer. Specifically, we build a holistic attention model, namely V2X-ViT, to effectively fuse information across on-road agents (i.e., vehicles and infrastructure). V2X-ViT consists of alternating layers of heterogeneous multi-agent self-attention and multi-scale window self-attention, which captures inter-agent interaction and per-agent spatial relationships. These key modules are designed in a unified Transformer architecture to handle common V2X challenges, including asynchronous information sharing, pose errors, and heterogeneity of V2X components. To validate our approach, we create a large-scale V2X perception dataset using CARLA and OpenCDA. Extensive experimental results demonstrate that V2X-ViT sets new state-of-the-art performance for 3D object detection and achieves robust performance even under harsh, noisy environments. The code is available at https://github.com/DerrickXuNu/v2x-vit.
翻译:在本文中,我们调查了车辆对一切(V2X)通信的应用,以提高自有车辆的感知性能。我们利用新视觉变异器,以V2X通信为V2X通信提供了一个强有力的合作感知框架。具体地说,我们建立了一个整体关注模式,即V2X-ViT,以有效地整合跨公路代理物(即车辆和基础设施)的信息。V2X-ViT由交错层的多剂自我关注和多尺度窗口自控组成,它包含跨层的多剂互动和每个试剂空间关系。这些关键模块设计在一个统一的变异器结构中,以应对V2X共同的挑战,包括无同步的信息共享、错误和V2X组件的异性。为了验证我们的方法,我们利用CARLA和OpenCDA创建了一个大型V2X感知识数据集。广泛的实验结果显示,V2X-VT为3D物体的探测设定了新的状态性能,甚至在严谨的环境下实现强效性。该代码可在 https://Xrickhu/Derv.com上查到。