Mask-based pre-training has achieved great success for self-supervised learning in image, video, and language, without manually annotated supervision. However, it has not yet been studied about large-scale point clouds with redundant spatial information in autonomous driving. As the number of large-scale point clouds is huge, it is impossible to reconstruct the input point clouds. In this paper, we propose a mask voxel classification network for large-scale point clouds pre-training. Our key idea is to divide the point clouds into voxel representations and classify whether the voxel contains point clouds. This simple strategy makes the network to be voxel-aware of the object shape, thus improving the performance of the downstream tasks, such as 3D object detection. Our Voxel-MAE with even a 90% masking ratio can still learn representative features for the high spatial redundancy of large-scale point clouds. We also validate the effectiveness of Voxel-MAE in unsupervised domain adaptative tasks, which proves the generalization ability of Voxel-MAE. Our Voxel-MAE proves that it is feasible to pre-train large-scale point clouds without data annotations to enhance the perception ability of the autonomous vehicle. Extensive experiments show great effectiveness of our pre-trained model with 3D object detectors (SECOND, CenterPoint, and PV-RCNN) on two popular datasets (KITTI, Waymo). Codes are publicly available at https://github.com/chaytonmin/Voxel-MAE.
翻译:在图像、视频和语言方面,以面具为基础的培训前的自我监督学习取得了巨大成功,在图像、视频和语言方面的自我监督学习没有人工附加说明的监管。 但是,还没有研究在自主驾驶中带有冗余空间信息的大型点云层。 由于大型点云的数量巨大,因此无法重建输入点云层。 在本文中,我们为大型点云培训前的大规模点云提出了一个掩码 voxel 分类网络。 我们的关键想法是将点云分为 voxel 表示, 并区分 voxel 是否包含点云。 这一简单战略使网络成为对象形状的 voxel-aware, 从而改进了下游任务(如3D 对象探测) 的性能。我们的Voxel-MAE 以甚至90%的遮掩码比例仍然可以学习大规模点云层的高度空间冗余性特征。 我们还验证了Voxel-MAE 在不受监控的域域调控任务中的有效性,这证明了Voxel-MAE 的通用能力。 我们的Voxel-MAE 证明, 我们的Vox-MAE 证明, 证明它能够提高下下游任务, 的下层任务在不公开测试前的轨道中, 的大规模的轨道上展示, 的轨道上, 的大规模的轨道上,可以展示, 高级的轨道上显示我们 的轨道 的轨道 的轨道上, 的大规模的轨道 的轨道 的轨道 的轨道上的数据路标 。