We present a unified, efficient and effective framework for point-cloud based 3D object detection. Our two-stage approach utilizes both voxel representation and raw point cloud data to exploit respective advantages. The first stage network, with voxel representation as input, only consists of light convolutional operations, producing a small number of high-quality initial predictions. Coordinate and indexed convolutional feature of each point in initial prediction are effectively fused with the attention mechanism, preserving both accurate localization and context information. The second stage works on interior points with their fused feature for further refining the prediction. Our method is evaluated on KITTI dataset, in terms of both 3D and Bird's Eye View (BEV) detection, and achieves state-of-the-arts with a 15FPS detection rate.
翻译:我们为基于点球的三维天体探测提供了一个统一、高效和有效的框架。我们的两阶段方法利用 voxel 表示和原始点云数据来利用各自的优势。第一阶段网络,以 voxel 表示作为投入,仅包括光电演动,产生少量高质量的初步预测。初始预测中每个点的协调和指数化的进化特征与关注机制有效结合,同时保存准确的本地化和背景信息。第二阶段内端点及其引信特性进行工程,以进一步改进预测。我们的方法在KITTI数据集上进行了评估,以3D和鸟类眼观(BEV)探测为标准,并以15FPS探测率实现艺术状态。