Effective BEV object detection on infrastructure can greatly improve traffic scenes understanding and vehicle-toinfrastructure (V2I) cooperative perception. However, cameras installed on infrastructure have various postures, and previous BEV detection methods rely on accurate calibration, which is difficult for practical applications due to inevitable natural factors (e.g., wind and snow). In this paper, we propose a Calibration-free BEV Representation (CBR) network, which achieves 3D detection based on BEV representation without calibration parameters and additional depth supervision. Specifically, we utilize two multi-layer perceptrons for decoupling the features from perspective view to front view and birdeye view under boxes-induced foreground supervision. Then, a cross-view feature fusion module matches features from orthogonal views according to similarity and conducts BEV feature enhancement with front view features. Experimental results on DAIR-V2X demonstrate that CBR achieves acceptable performance without any camera parameters and is naturally not affected by calibration noises. We hope CBR can serve as a baseline for future research addressing practical challenges of infrastructure perception.
翻译:摘要:在基础设施上有效地进行BEV目标检测可以大大改善交通场景的理解和车辆对基础设施(V2I)的协同感知。然而,基础设施上安装的相机姿态各异,以前的BEV检测方法依赖于准确的标定,由于不可避免的自然因素(如风和雪)在实际应用中难以实现标定。本文提出了一种 Calibration-free BEV Representation (CBR) 网络,它基于 BEV 表示实现了3D检测,无需标定参数和额外的深度监督。具体而言,我们利用两个多层感知器将特征从透视视图分解为前视图和鸟瞰图,并采用基于框的前景监督。然后,通过交叉视图特征融合模块,根据相似性匹配正交视图中的特征,并利用前视图特征进行BEV特征增强。DAIR-V2X上的实验结果表明,CBR在不需要任何相机参数的情况下实现了可接受的性能,并且自然不受标定噪声的影响。我们希望CBR成为未来研究基础设施感知实际挑战的基础。