In this research, we propose a new 3D object detector with a trustworthy depth estimation, dubbed BEVDepth, for camera-based Bird's-Eye-View (BEV) 3D object detection. Our work is based on a key observation -- depth estimation in recent approaches is surprisingly inadequate given the fact that depth is essential to camera 3D detection. Our BEVDepth resolves this by leveraging explicit depth supervision. A camera-awareness depth estimation module is also introduced to facilitate the depth predicting capability. Besides, we design a novel Depth Refinement Module to counter the side effects carried by imprecise feature unprojection. Aided by customized Efficient Voxel Pooling and multi-frame mechanism, BEVDepth achieves the new state-of-the-art 60.9% NDS on the challenging nuScenes test set while maintaining high efficiency. For the first time, the NDS score of a camera model reaches 60%.
翻译:在此研究中,我们建议一个新的三维对象探测器,其深度估计值值得信赖,称为BEVDepth,用于基于摄像头的Bird's-Eye-View(BEV) 3D对象探测。我们的工作以关键观测为基础 -- -- 近期方法的深度估计值不足,因为深度对于摄像 3D 探测至关重要。我们的三维对象探测器利用明确的深度监督来解决这个问题。还引入了一个摄像了解深度估计模块,以方便深度预测能力。此外,我们设计了一个新的深度精密模块,以对抗不精确的特征所携带的副作用。在定制的高效Voxel 集合和多框架机制的帮助下,BEVDept在高效率的同时,在具有挑战性的nuScenes测试集上实现了新的60.9% NDS状态。第一次,一个相机模型的NDS评分达到60%。