3D object detection task from lidar or camera sensors is essential for autonomous driving. Pioneer attempts at multi-modality fusion complement the sparse lidar point clouds with rich semantic texture information from images at the cost of extra network designs and overhead. In this work, we propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models with the guidance of rich context painting, with no extra computation cost during inference. Our key design is to first exploit the potential instructive semantic knowledge within the ground-truth labels by training a semantic-painted teacher model and then guide the pure-lidar network to learn the semantic-painted representation via knowledge passing modules at different granularities: class-wise passing, pixel-wise passing and instance-wise passing. Experimental results show that the proposed SPNet can seamlessly cooperate with most existing 3D detection frameworks with 1~5% AP gain and even achieve new state-of-the-art 3D detection performance on the KITTI test benchmark. Code is available at: https://github.com/jb892/SPNet.
翻译:3D 目标检测任务来自 Lidar 或相机传感器,对于自主驱动至关重要。 多式聚合的先锋尝试以以额外的网络设计和管理成本为代价,对稀疏的 Lidar点云云进行丰富的语义纹理信息补充。在这项工作中,我们提议了一个名为 SPNet 的新型语义通过框架,以丰富背景绘画为指导,提升现有基于Lidar 的3D 检测模型的性能,在推断过程中不产生额外的计算费用。我们的关键设计是首先通过培训一个语义涂鸦教师模型,利用地真伪标签中潜在的启发性语义学知识,然后指导纯lidar网络通过不同微粒体的知识传递模块学习语义-语义表达。 实验结果表明,拟议的SPNet能够与大多数现有的3D 检测框架进行无缝合作,获得1-5% AP 的收益,甚至实现KITTI 测试基准的新的状态- 3D 检测性能。 代码可查到: https:// SP/ lix2/ 。