In this paper, we investigate the combination of voxel-based methods and point-based methods, and propose a novel end-to-end two-stage 3D object detector named SGNet for point clouds scenes. The voxel-based methods voxelize the scene to regular grids, which can be processed with the current advanced feature learning frameworks based on convolutional layers for semantic feature learning. Whereas the point-based methods can better extract the geometric feature of the point due to the coordinate reservations. The combination of the two is an effective solution for 3D object detection from point clouds. However, most current methods use a voxel-based detection head with anchors for final classification and localization. Although the preset anchors cover the entire scene, it is not suitable for point clouds detection tasks with larger scenes and multiple categories due to the limitation of voxel size. In this paper, we propose a voxel-to-point module (VTPM) that captures semantic and geometric features. The VTPM is a Voxel-Point-Based Module that finally implements 3D object detection in point space, which is more conducive to the detection of small-size objects and avoids the presets of anchors in inference stage. In addition, a Confidence Adjustment Module (CAM) with the center-boundary-aware confidence attention is proposed to solve the misalignment between the predicted confidence and proposals in the regions of the interest (RoI) selection. The SGNet proposed in this paper has achieved state-of-the-art results for 3D object detection in the KITTI dataset, especially in the detection of small-size objects such as cyclists. Actually, as of September 19, 2021, for KITTI dataset, SGNet ranked 1st in 3D and BEV detection on cyclists with easy difficulty level, and 2nd in the 3D detection of moderate cyclists.
翻译:在本文中, 我们调查基于 voxel 的方法和点基方法的结合, 并提出一个新的端到端二级 3D 对象探测器, 名为 SGNet 的 SGNet, 用于点云屏幕。 基于 voxel 的方法将场点变成常规网格, 可以使用基于语义特征学习的进化层的当前高级特征学习框架进行处理。 虽然基于点的方法可以更好地提取点对点的几何特征, 因为协调保留。 两者的结合是点云3D 对象探测的有效解决方案。 然而, 目前大多数方法使用基于 3D 的基于 3D 的检测头, 并带有最终分类和本地化的锁定点。 虽然基于 voxel 的方法将场的场点锁定为常规网格, 但是由于对语义特征的大小限制, 它不适合使用当前高级的云层探测任务和多个类别。 在本文中, 我们提出一个从点到点点点的定位模块模块模块模块, 用于测测测测测测 3- 的 3- 点, 中, 将 3SD 的 的 数据作为更有利于 的 的 级 的 。