Learning representations for point clouds is an important task in 3D computer vision, especially without manually annotated supervision. Previous methods usually take the common aid from auto-encoders to establish the self-supervision by reconstructing the input itself. However, the existing self-reconstruction based auto-encoders merely focus on the global shapes, and ignore the hierarchical context between the local and global geometries, which is a crucial supervision for 3D representation learning. To resolve this issue, we present a novel self-supervised point cloud representation learning framework, named 3D Occlusion Auto-Encoder (3D-OAE). Our key idea is to randomly occlude some local patches of the input point cloud and establish the supervision via recovering the occluded patches using the remaining visible ones. Specifically, we design an encoder for learning the features of visible local patches, and a decoder for leveraging these features to predict the occluded patches. In contrast with previous methods, our 3D-OAE can remove a large proportion of patches and predict them only with a small number of visible patches, which enable us to significantly accelerate training and yield a nontrivial self-supervisory performance. The trained encoder can be further transferred to various downstream tasks. We demonstrate our superior performances over the state-of-the-art methods in different discriminant and generative applications under widely used benchmarks.
翻译:在 3D 计算机视野中, 特别是没有手动附加说明的监管, 点云的学习表现是一项重要任务 3D 计算机视野中, 特别是没有手动附加说明的监视 。 以往的方法通常需要自动编码器的共同帮助, 通过重建输入本身来建立自我监督的视野 。 但是, 现有的基于自重建的自动编码器仅仅侧重于全球形状, 忽视本地和全球地貌之间的分级环境, 这对于 3D 代表学习至关重要 。 为了解决这个问题, 我们提出了一个全新的自我监督的点云代表学习框架, 名为 3D 隐蔽自动编码( 3D- OAE ) 。 我们的关键想法是随机地将输入点云的某些局部部分隐藏起来, 并通过利用其余可见的可见的形状来恢复隐蔽的补丁建立监督。 具体地说, 我们设计了一个编码器来学习可见的本地补丁的特征的特性, 以及利用这些特性来预测隐蔽的补丁的补丁。 与以前的方法不同, 我们的3D- OE 广泛删除可以删除大部分的补丁和预测它们, 只能以少量的精度的精度 。 我们所训练过的双层的精度性工作可以大大的升级的升级的升级的成绩, 。