Accurate 3D object detection from point clouds has become a crucial component in autonomous driving. However, the volumetric representations and the projection methods in previous works fail to establish the relationships between the local point sets. In this paper, we propose Sparse Voxel-Graph Attention Network (SVGA-Net), a novel end-to-end trainable network which mainly contains voxel-graph module and sparse-to-dense regression module to achieve comparable 3D detection tasks from raw LIDAR data. Specifically, SVGA-Net constructs the local complete graph within each divided 3D spherical voxel and global KNN graph through all voxels. The local and global graphs serve as the attention mechanism to enhance the extracted features. In addition, the novel sparse-to-dense regression module enhances the 3D box estimation accuracy through feature maps aggregation at different levels. Experiments on KITTI detection benchmark demonstrate the efficiency of extending the graph representation to 3D object detection and the proposed SVGA-Net can achieve decent detection accuracy.
翻译:从点云中准确探测三维物体已成为自主驱动的一个关键组成部分。 但是,以往工作中的体积表示和预测方法未能建立本地点各组之间的关系。 在本文件中,我们提议建立Sparse Voxel-Graph 注意网(SVGA-Net),这是一个全新的端到端训练网络,主要包含 voxel 绘图模块和稀到到的回归模块,以便从原始LIDAR 数据中实现类似的三维探测任务。 具体地说, SVGA- Net 通过所有的 voxel 在所有3D 球体和全球 KNNN 图形中构建了本地完整的图表。 本地和全球图形是加强提取特征的注意机制。 此外,新颖的稀到点回归模块通过不同层次的地貌图组合提高3D框估计准确性。 KITTI 检测基准的实验表明将图形表示到 3D 对象探测,而拟议的 SVGA-Net 能够实现体面的探测准确性。