Recent work on 4D point cloud sequences has attracted a lot of attention. However, obtaining exhaustively labeled 4D datasets is often very expensive and laborious, so it is especially important to investigate how to utilize raw unlabeled data. However, most existing self-supervised point cloud representation learning methods only consider geometry from a static snapshot omitting the fact that sequential observations of dynamic scenes could reveal more comprehensive geometric details. And the video representation learning frameworks mostly model motion as image space flows, let alone being 3D-geometric-aware. To overcome such issues, this paper proposes a new 4D self-supervised pre-training method called Complete-to-Partial 4D Distillation. Our key idea is to formulate 4D self-supervised representation learning as a teacher-student knowledge distillation framework and let the student learn useful 4D representations with the guidance of the teacher. Experiments show that this approach significantly outperforms previous pre-training approaches on a wide range of 4D point cloud sequence understanding tasks including indoor and outdoor scenarios.
翻译:最近关于4D点云序列的工作引起了许多关注。然而,获得贴上四D点云序列的详尽标签的工作往往非常昂贵和费力,因此特别重要的是调查如何使用原始无标签数据。然而,大多数现有的自监督点云代表教学方法仅考虑从静态瞬间进行几何,忽略了动态场景的连续观测可以揭示更全面的几何细节这一事实。视频代表学习框架大多以图像空间流为模型,更不用说3D地貌的认知。为了克服这些问题,本文件提出了一个新的四D自我监督的训练前预教方法,名为“完整到Partial 4Dstillation”。我们的关键想法是将四D自我监督的教学设计成教师-学生知识蒸馏框架,让学生在教师的指导下学习有用的四D代表。实验表明,这一方法大大超出了以前对四D点云系列广泛理解任务(包括室内和室外情景)的培训前方法。