To achieve accurate and low-cost 3D object detection, existing methods propose to benefit camera-based multi-view detectors with spatial cues provided by the LiDAR modality, e.g., dense depth supervision and bird-eye-view (BEV) feature distillation. However, they directly conduct point-to-point mimicking from LiDAR to camera, which neglects the inner-geometry of foreground targets and suffers from the modal gap between 2D-3D features. In this paper, we propose the learning scheme of Target Inner-Geometry from the LiDAR modality into camera-based BEV detectors for both dense depth and BEV features, termed as TiG-BEV. First, we introduce an inner-depth supervision module to learn the low-level relative depth relations between different foreground pixels. This enables the camera-based detector to better understand the object-wise spatial structures. Second, we design an inner-feature BEV distillation module to imitate the high-level semantics of different keypoints within foreground targets. To further alleviate the BEV feature gap between two modalities, we adopt both inter-channel and inter-keypoint distillation for feature-similarity modeling. With our target inner-geometry distillation, TiG-BEV can effectively boost BEVDepth by +2.3% NDS and +2.4% mAP, along with BEVDet by +9.1% NDS and +10.3% mAP on nuScenes val set. Code will be available at https://github.com/ADLab3Ds/TiG-BEV.
翻译:为了实现准确和低成本的三维天体探测,现有方法建议利用LiDAR模式提供的空间提示,例如密集深度监督和鸟眼视图(BEV)的蒸馏功能,使基于摄像的多视探测器从LiDAR到照相机直接进行点到点的模拟,这忽视了前景目标的内地测量,并受到2D-3D特征之间的模式差距的影响。在本文件中,我们建议采用基于摄像的BEV探测器,从LiDAR模式到基于摄像的深层和BEV特征的光源多视探测器,称为 TiG-BEV。首先,我们引入一个内部深度监督模块,以了解不同地表层像头的相近深度关系。这使基于摄像的探测器能够更好地了解天体空间结构。第二,我们设计了一个内地变变码变异变码蒸馏模块,在地面目标中,不同关键点的BEVDV+DS 3 和BS-Still3的B-S-S-S-S-revAx-renceal-ral 内变缩缩缩缩 和内变动的内位模式之间,我们采用了BG-ral-q-q-ral-ral-q-al-al-q-q-al-q-q-q-al-al-q-q-al-al-al-al-al-ld-ld-al-al-al-l-l-ld-ld-l-l-l-l-ld-ld-ld-ld-ld-ld-ld-ld-ld-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l-l