Mask-based pre-training has achieved great success for self-supervised learning in image, video and language, without manually annotated supervision. However, as information redundant data, it has not yet been studied in the field of 3D object detection. As the point clouds in 3D object detection is large-scale, it is impossible to reconstruct the input point clouds. In this paper, we propose a mask voxel classification network for large-scale point clouds pre-training. Our key idea is to divide the point clouds into voxel representations and classify whether the voxel contains point clouds. This simple strategy makes the network to be voxel-aware of the object shape, thus improving the performance of 3D object detection. Extensive experiments show great effectiveness of our pre-trained model with 3D object detectors (SECOND, CenterPoint, and PV-RCNN) on three popular datasets (KITTI, Waymo, and nuScenes). Codes are publicly available at https: //github.com/chaytonmin/Voxel-MAE.
翻译:以面具为基础的培训前,在没有人工附加说明的监督下,在图像、视频和语言的自我监督学习方面取得了巨大成功。但是,由于信息冗余数据,尚未在3D物体探测领域对其进行研究。由于3D物体探测中的点云是大规模,因此不可能重建输入点云。在本文中,我们提议为大型点云训练前的大规模云层建立一个蒙面Voxel分类网络。我们的主要想法是将点云分为 voxel 表示方式,并对 voxel 是否包含点云进行分类。这一简单战略使网络成为3D物体形状的 voxel-aware,从而改进了3D物体探测的性能。广泛的实验显示我们预先训练的3D物体探测器模型(SECOND、CentPoint和PV-RCNNN)在三种流行数据集(KITTI、Waymo和nuScenes)上非常有效(KITTI、Waymo和PV-RCNNN)。代码可在https:/githhuthub.com/chaytonmin/Voxel-MAE)上公开查阅。