As camera and LiDAR sensors capture complementary information used in autonomous driving, great efforts have been made to develop semantic segmentation algorithms through multi-modality data fusion. However, fusion-based approaches require paired data, i.e., LiDAR point clouds and camera images with strict point-to-pixel mappings, as the inputs in both training and inference, which seriously hinders their application in practical scenarios. Thus, in this work, we propose the 2D Priors Assisted Semantic Segmentation (2DPASS), a general training scheme, to boost the representation learning on point clouds, by fully taking advantage of 2D images with rich appearance. In practice, by leveraging an auxiliary modal fusion and multi-scale fusion-to-single knowledge distillation (MSFSKD), 2DPASS acquires richer semantic and structural information from the multi-modal data, which are then online distilled to the pure 3D network. As a result, equipped with 2DPASS, our baseline shows significant improvement with only point cloud inputs. Specifically, it achieves the state-of-the-arts on two large-scale benchmarks (i.e. SemanticKITTI and NuScenes), including top-1 results in both single and multiple scan(s) competitions of SemanticKITTI.
翻译:由于照相机和激光雷达传感器收集了自主驱动过程中使用的补充信息,我们已作出巨大努力,通过多式数据聚合开发语义分解算法;然而,以聚合为基础的方法需要配对数据,即激光雷达点云和带有严格的点到像素绘图的照相机图像,作为培训和推论中的投入,严重妨碍在实际情景中应用这些数据。因此,我们在此工作中提议采用2D 先前辅助语义分解(2DPASS)这一一般培训计划,通过充分利用2D图像的丰富外观来充分利用2D图像,促进点云上的代表性学习。在实践中,通过利用辅助模式聚变和多级聚变相图像的模拟知识蒸馏(MSFSKD),2DPASS从多式数据中获取了更丰富的语义和结构信息,然后通过在线蒸馏到纯3D网络。因此,我们的基线显示只有点云流的显著改进,只有点输入。具体地说,它通过利用辅助模式融合和多级的多级知识蒸馏(MS-IT),在Si-Si-Si-I(Scial-I-I-I)上取得了最高级和双级的S-Si-S-IIT(Sci-Sci-I-S-IST-S-S-S-S-S-I-I-S-I-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-I-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-I-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S