Recent LSS-based multi-view 3D object detection has made tremendous progress, by processing the features in Brid-Eye-View (BEV) via the convolutional detector. However, the typical convolution ignores the radial symmetry of the BEV features and increases the difficulty of the detector optimization. To preserve the inherent property of the BEV features and ease the optimization, we propose an azimuth-equivariant convolution (AeConv) and an azimuth-equivariant anchor. The sampling grid of AeConv is always in the radial direction, thus it can learn azimuth-invariant BEV features. The proposed anchor enables the detection head to learn predicting azimuth-irrelevant targets. In addition, we introduce a camera-decoupled virtual depth to unify the depth prediction for the images with different camera intrinsic parameters. The resultant detector is dubbed Azimuth-equivariant Detector (AeDet). Extensive experiments are conducted on nuScenes, and AeDet achieves a 62.0% NDS, surpassing the recent multi-view 3D object detectors such as PETRv2 and BEVDepth by a large margin. Project page: https://fcjian.github.io/aedet.
翻译:最近,通过卷积探测器在 Brid-Eye-View(BEV)中处理特征,实现了基于 LSS 的多视角 3D 物体检测的巨大进展。然而,典型的卷积忽略了 BEV 特征的径向对称性,并增加了检测器优化的难度。为了保留 BEV 特征的固有属性并减轻优化,我们提出了一个方位等变卷积(AeConv)和一个方位等变锚点。AeConv 的采样网格始终在径向上,因此它可以学习方位不变的 BEV 特征。提出的锚点使检测头学习可以预测方位不相关的目标。此外,我们引入一个解耦相机的虚拟深度,以统一不同相机内部参数的图像深度预测。产生的检测器被称为 Azimuth-等变检测器(AeDet)。在 nuScenes 上进行了广泛的实验,AeDet 达到了 62.0% 的 NDS,大大超过了近期的 PETRv2 和 BEVDepth 等多视角 3D 物体检测器。项目页面:https://fcjian.github.io/aedet。