LiDAR and camera, as two different sensors, supply geometric (point clouds) and semantic (RGB images) information of 3D scenes. However, it is still challenging for existing methods to fuse data from the two cross sensors, making them complementary for quality 3D object detection (3OD). We propose ImLiDAR, a new 3OD paradigm to narrow the cross-sensor discrepancies by progressively fusing the multi-scale features of camera Images and LiDAR point clouds. ImLiDAR enables to provide the detection head with cross-sensor yet robustly fused features. To achieve this, two core designs exist in ImLiDAR. First, we propose a cross-sensor dynamic message propagation module to combine the best of the multi-scale image and point features. Second, we raise a direct set prediction problem that allows designing an effective set-based detector to tackle the inconsistency of the classification and localization confidences, and the sensitivity of hand-tuned hyperparameters. Besides, the novel set-based detector can be detachable and easily integrated into various detection networks. Comparisons on both the KITTI and SUN-RGBD datasets show clear visual and numerical improvements of our ImLiDAR over twenty-three state-of-the-art 3OD methods.
翻译:LiDAR和相机作为两个不同的传感器,提供三维场景的几何(点云)和语义(RGB图像)信息。然而,对于从两个交叉传感器中整合数据的现有方法来说,仍然具有挑战性,使这两个传感器的数据能够配合质量的三维物体探测(3OD)。我们建议ImLiDAR,这是一个新的3OD模式,通过逐步引信相机图像和LIDAR点云的多尺度特征,缩小跨传感器差异。IMLiDAR能够向探测头提供交叉传感器的交叉传感器和强力引信特征。此外,为实现这一目标,IMLiDAR存在两个核心设计。首先,我们提议了一个跨传感器动态信息传播模块,以结合最佳的多尺度图像和点特征探测。第二,我们提出了一个直接的设定预测问题,以便设计一个有效的基于定点的探测器,解决分类和本地化信任的不一致之处,以及手调超直径谱仪的敏感度。此外,基于新组的探测器可以分解,并容易地融入各种探测网络。首先,我们提议建立一个跨传感器的动态传感器动态传感器动态传感器动态传感器动态传感器动态传感器动态和图像-D的SROG-DSROG-RD-DS-DS-RG-RDS-DS-Dxxx20S-S-DS-DS-S-S-DS-DDS-S-S-S-S-S-DS-S-S-S-S-S-S-DS-S-S-S-S-S-DSDS