LiDAR-camera fusion methods have shown impressive performance in 3D object detection. Recent advanced multi-modal methods mainly perform global fusion, where image features and point cloud features are fused across the whole scene. Such practice lacks fine-grained region-level information, yielding suboptimal fusion performance. In this paper, we present the novel Local-to-Global fusion network (LoGoNet), which performs LiDAR-camera fusion at both local and global levels. Concretely, the Global Fusion (GoF) of LoGoNet is built upon previous literature, while we exclusively use point centroids to more precisely represent the position of voxel features, thus achieving better cross-modal alignment. As to the Local Fusion (LoF), we first divide each proposal into uniform grids and then project these grid centers to the images. The image features around the projected grid points are sampled to be fused with position-decorated point cloud features, maximally utilizing the rich contextual information around the proposals. The Feature Dynamic Aggregation (FDA) module is further proposed to achieve information interaction between these locally and globally fused features, thus producing more informative multi-modal features. Extensive experiments on both Waymo Open Dataset (WOD) and KITTI datasets show that LoGoNet outperforms all state-of-the-art 3D detection methods. Notably, LoGoNet ranks 1st on Waymo 3D object detection leaderboard and obtains 81.02 mAPH (L2) detection performance. It is noteworthy that, for the first time, the detection performance on three classes surpasses 80 APH (L2) simultaneously. Code will be available at \url{https://github.com/sankin97/LoGoNet}.
翻译:LIDAR- camera 聚合方法在 3D 对象检测中表现令人印象深刻 。 最近先进的多模式方法主要表现全球融合, 图像特征和点云特征在全场融合。 这种做法缺乏细微的区级信息, 产生亚最佳的聚合性效果 。 在本文中, 我们展示了本地到全球融合网络的新颖网络( LOGONet), 在地方和全球级别上运行LIDAR- camera 融合 。 具体地说, LoGONet 的Global Fusion (GOF) 以先前的文献为基础, 而我们专门使用点分数来更准确地代表 voxel 特性的位置, 从而实现更好的交叉模式一致。 关于本地融合(LOF), 我们首先将每个提案分割成统一的网际融合网络网络网络网络(LGONet Net) 。 预测网点周围的图像特征要与定位点的云点特征结合, 最充分地利用该提议周围的丰富背景信息信息 。 Fetal AGND (FA) 模块将进一步在本地检测中进行80- 3OD 和GOOD 数据 。</s>