Recently, the self-supervised learning framework data2vec has shown inspiring performance for various modalities using a masked student-teacher approach. However, it remains open whether such a framework generalizes to the unique challenges of 3D point clouds. To answer this question, we extend data2vec to the point cloud domain and report encouraging results on several downstream tasks. In an in-depth analysis, we discover that the leakage of positional information reveals the overall object shape to the student even under heavy masking and thus hampers data2vec to learn strong representations for point clouds. We address this 3D-specific shortcoming by proposing point2vec, which unleashes the full potential of data2vec-like pre-training on point clouds. Our experiments show that point2vec outperforms other self-supervised methods on shape classification and few-shot learning on ModelNet40 and ScanObjectNN, while achieving competitive results on part segmentation on ShapeNetParts. These results suggest that the learned representations are strong and transferable, highlighting point2vec as a promising direction for self-supervised learning of point cloud representations.
翻译:最近,自监督学习框架data2vec已经展现了在各种模态下使用遮罩师生方法的可喜表现。然而,此框架是否推广到3D点云的独特挑战仍然具有开放性。为了回答这个问题,我们将data2vec扩展到点云领域,并报告了在几个下游任务中令人鼓舞的结果。在深入的分析中,我们发现位置信息的泄漏即使在重度屏蔽下也向学生展示了总体物体形状,从而妨碍data2vec学习点云的强大表征。我们提出了point2vec来解决这些3D特定的缺陷,并充分释放了点云上data2vec-like预训练的潜力。我们的实验表明,在ModelNet40和ScanObjectNN上,point2vec在形状分类和少样本学习方面优于其他自监督方法,在ShapeNetParts的部分分割上取得了竞争性的结果。这些结果表明,所学到的表征是强大且可转移的,突出了point2vec作为点云表征自监督学习的一个有前途的方向。