Camera and 3D LiDAR sensors have become indispensable devices in modern autonomous driving vehicles, where the camera provides the fine-grained texture, color information in 2D space and LiDAR captures more precise and farther-away distance measurements of the surrounding environments. The complementary information from these two sensors makes the two-modality fusion be a desired option. However, two major issues of the fusion between camera and LiDAR hinder its performance, \ie, how to effectively fuse these two modalities and how to precisely align them (suffering from the weak spatiotemporal synchronization problem). In this paper, we propose a coarse-to-fine LiDAR and camera fusion-based network (termed as LIF-Seg) for LiDAR segmentation. For the first issue, unlike these previous works fusing the point cloud and image information in a one-to-one manner, the proposed method fully utilizes the contextual information of images and introduces a simple but effective early-fusion strategy. Second, due to the weak spatiotemporal synchronization problem, an offset rectification approach is designed to align these two-modality features. The cooperation of these two components leads to the success of the effective camera-LiDAR fusion. Experimental results on the nuScenes dataset show the superiority of the proposed LIF-Seg over existing methods with a large margin. Ablation studies and analyses demonstrate that our proposed LIF-Seg can effectively tackle the weak spatiotemporal synchronization problem.
翻译:3D LiDAR 传感器 和 3D LiDAR 传感器已成为现代自主驱动器中不可或缺的装置, 摄影机提供了精细的纹理、 2D 空间的彩色信息, 以及 2D 空间的彩色信息, 以及 LIDAR 采集的更精确和更远的周围环境的距离测量。 这两个传感器的补充信息使得双模式融合是一个理想的选择。 然而, 相机和 LDAR 之间结合的两个主要问题妨碍了其性能。 但是, 相机和 LDAR 之间的结合如何有效地融合这两个模式, 以及如何精确地结合这两个模式( 由微弱的时空同步问题所造成 ) 。 在本文中, 我们建议为LDAR 的网络( 称为LIF- Seg ) 和 相机融合网络( 称为LIF- Seg ) 采集的更精确到更精确的图像分析 。 与这些先前的工作不同, 拟议的方法充分利用了图像的背景信息, 并引入了一种简单而有效的早期融合战略。 其次,, 由于微波调同步问题, 我们的同步同步同步问题,, 抵消了两种对调方法的精确化方法,, 方法的精确化方法的, 正在有效地调整了 演示的 。