3D LiDAR (light detection and ranging) semantic segmentation is important in scene understanding for many applications, such as auto-driving and robotics. For example, for autonomous cars equipped with RGB cameras and LiDAR, it is crucial to fuse complementary information from different sensors for robust and accurate segmentation. Existing fusion-based methods, however, may not achieve promising performance due to the vast difference between the two modalities. In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF) to exploit perceptual information from two modalities, namely, appearance information from RGB images and spatio-depth information from point clouds. To this end, we first project point clouds to the camera coordinates to provide spatio-depth information for RGB images. Then, we propose a two-stream network to extract features from the two modalities, separately, and fuse the features by effective residual-based fusion modules. Moreover, we propose additional perception-aware losses to measure the perceptual difference between the two modalities. Extensive experiments on two benchmark data sets show the superiority of our method. For example, on nuScenes, our PMF outperforms the state-of-the-art method by 0.8 in mIoU.
翻译:3D LiDAR (光探测和测距) 语义分解对于许多应用,例如自动驾驶和机器人等的现场理解非常重要。例如,对于配备 RGB 相机和激光雷达的自动汽车来说,将不同传感器的补充信息整合起来,以便进行稳健和准确的分解至关重要。但是,由于两种模式之间的巨大差异,现有的聚变方法可能无法取得有希望的性能。在这项工作中,我们调查了一个称为感知-觉多传感器聚合(PMF)的合作融合计划,以利用两种模式(即RGB 图像的外观信息和点云层深度信息)的感知性信息。为此,我们首先将云点到摄像坐标,以便为RGB 图像提供光度-深度信息。然后,我们提出一个双流网络,分别从两种模式中提取特征,并通过有效的基于残余的聚变聚模块结合这些特征。此外,我们提议增加感知-觉损失,以测量两种模式之间的感知性差异。关于两个基准数据的广泛实验将显示我们方法的超高度。