In this paper, we propose PointRCNN for 3D object detection from raw point cloud. The whole framework is composed of two stages: stage-1 for the bottom-up 3D proposal generation and stage-2 for refining proposals in the canonical coordinates to obtain the final detection results. Instead of generating proposals from RGB image or projecting point cloud to bird's view or voxels as previous methods do, our stage-1 sub-network directly generates a small number of high-quality 3D proposals from point cloud in a bottom-up manner via segmenting the point cloud of whole scene into foreground points and background. The stage-2 sub-network transforms the pooled points of each proposal to canonical coordinates to learn better local spatial features, which is combined with global semantic features of each point learned in stage-1 for accurate box refinement and confidence prediction. Extensive experiments on the 3D detection benchmark of KITTI dataset show that our proposed architecture outperforms state-of-the-art methods with remarkable margins by using only point cloud as input.
翻译:在本文中,我们提议从原始云层中检测3D对象,整个框架由两个阶段组成:自下而上3D建议生成的第1阶段和精炼运河坐标中的建议以获得最后检测结果的第2阶段。我们不象以前的方法那样从RGB图像或投影点云到鸟的视野或氧化物中产生建议,而是通过将整个场点云分为地表点和背景,从点云中直接产生少量高质量的3D建议。第2阶段子网络将每个建议的集合点转换成能够坐标以学习更好的当地空间特征,这与第1阶段中学习的每个点的全球语义特征相结合,用于精确的箱式改进和信心预测。关于KITTI数据集的3D探测基准的广泛实验表明,我们提议的架构仅使用点云作为投入,就显著的边缘值而言,超越了最先进的标准方法。