This paper presents a new approach to boost a single-modality (LiDAR) 3D object detector by teaching it to simulate features and responses that follow a multi-modality (LiDAR-image) detector. The approach needs LiDAR-image data only when training the single-modality detector, and once well-trained, it only needs LiDAR data at inference. We design a novel framework to realize the approach: response distillation to focus on the crucial response samples and avoid the background samples; sparse-voxel distillation to learn voxel semantics and relations from the estimated crucial voxels; a fine-grained voxel-to-point distillation to better attend to features of small and distant objects; and instance distillation to further enhance the deep-feature consistency. Experimental results on the nuScenes dataset show that our approach outperforms all SOTA LiDAR-only 3D detectors and even surpasses the baseline LiDAR-image detector on the key NDS metric, filling 72% mAP gap between the single- and multi-modality detectors.
翻译:本文介绍了一种促进单一模式(LiDAR) 3D 对象探测器的新方法,通过教它模拟符合多种模式(LiDAR-image) 探测器的特征和反应的模拟特征和反应。 只有当培训单一模式探测器时, 并且一旦经过良好培训, 它才需要LiDAR 图像数据, 一旦经过良好培训, 它只需要在推论时使用LiDAR 数据。 我们设计了一个实现该方法的新框架: 反应蒸馏, 以聚焦于关键响应样本, 避免背景样本; 稀释蒸馏, 以从估计的关键 voxel 中学习 voxel 语义学和关系; 精细微微的 voxel-point 蒸馏, 以更好地关注小型和远程物体的特征; 以及 微蒸馏, 以进一步提高深度的一致性。 核星数据集的实验结果显示, 我们的方法超越了所有SOTA LIDAR- only 3DD探测器, 甚至超越了关键NDSDS 测量仪上基线的LDAR- madmod map- mali- mali- mali- sadal dal roglegleglegy.