Mask-based pre-training has achieved great success for self-supervised learning in images and languages without manually annotated supervision. However, it has not yet been studied for large-scale point clouds with redundant spatial information. In this research, we propose a mask voxel autoencoder network for pre-training large-scale point clouds, dubbed Voxel-MAE. Our key idea is to transform the point clouds into voxel representations and classify whether the voxel contains point clouds. This simple but effective strategy makes the network voxel-aware of the object shape, thus improving the performance of downstream tasks, such as 3D object detection. Our Voxel-MAE, with even a 90% masking ratio, can still learn representative features for the high spatial redundancy of large-scale point clouds. We also validate the effectiveness of Voxel-MAE on unsupervised domain adaptative tasks, which proves the generalization ability of Voxel-MAE. Our Voxel-MAE proves that it is feasible to pre-train large-scale point clouds without data annotations to enhance the perception ability of the autonomous vehicle. Extensive experiments show great effectiveness of our pre-training method with 3D object detectors (SECOND, CenterPoint, and PV-RCNN) on three popular datasets (KITTI, Waymo, and nuScenes).
翻译:以面具为基础的培训前,在没有人工附加说明的监管下,在图像和语言的自我监督学习方面取得了巨大成功。然而,还没有对带有冗余空间信息的大型点云进行研究。在这个研究中,我们提议为大规模点云的预培训前培训,设为Voxel-MAE, 使用面具自动自动编码网络。 我们的关键想法是将点云转换成 voxel 表示方式,并分类 voxel 是否包含点云。 这个简单而有效的战略使得天体形状的网络反oxel-aware, 从而改进了3D 对象探测等下游任务的性能。 我们的Voxel-MAE, 使用甚至90%的遮盖率, 仍然可以学习大规模点云层云高度空间冗余的代表性特征。 我们还验证了Voxel-MAE 在不受监管的域适应性任务上的有效性,这证明了Voxel-MAE 的概括能力。 我们的Voxel-MAE 证明, 在没有数据说明的情况下, 之前的大型点云可以进行前的大型点云, 提高SIS-CS-C-C-C-C-CRD 3号 测试, 3S-C-C-C-C-C-C-C-C-C-C-C-Cent-C-C-C-C-C-C-S-S-CAR-S-S-S-S-S-S-S-S-S-S-S-S-S-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-CAR-C-C-CAR-C-C-C-C-C-CAR-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-CAR-CAR-C-C-CAR-C-C-C-C-C-C-C-C-C-C-C-C-C-SAR-S-S-S-C-S-S-CAR-CAR-C-C-C-C-C-C-C-