The annotation for large-scale point clouds is still time-consuming and unavailable for many real-world tasks. Point cloud pre-training is one potential solution for obtaining a scalable model for fast adaptation. Therefore, in this paper, we investigate a new self-supervised learning approach, called Mixing and Disentangling (MD), for point cloud pre-training. As the name implies, we explore how to separate the original point cloud from the mixed point cloud, and leverage this challenging task as a pretext optimization objective for model training. Considering the limited training data in the original dataset, which is much less than prevailing ImageNet, the mixing process can efficiently generate more high-quality samples. We build one baseline network to verify our intuition, which simply contains two modules, encoder and decoder. Given a mixed point cloud, the encoder is first pre-trained to extract the semantic embedding. Then an instance-adaptive decoder is harnessed to disentangle the point clouds according to the embedding. Albeit simple, the encoder is inherently able to capture the point cloud keypoints after training and can be fast adapted to downstream tasks including classification and segmentation by the pre-training and fine-tuning paradigm. Extensive experiments on two datasets show that the encoder + ours (MD) significantly surpasses that of the encoder trained from scratch and converges quickly. In ablation studies, we further study the effect of each component and discuss the advantages of the proposed self-supervised learning strategy. We hope this self-supervised learning attempt on point clouds can pave the way for reducing the deeply-learned model dependence on large-scale labeled data and saving a lot of annotation costs in the future.
翻译:大型点云的说明仍然耗时且无法用于许多真实世界的任务。 点云前培训是获得可缩放模型以快速适应的可能的解决方案。 因此, 在本文中, 我们调查了一种新的自监督学习方法, 叫做混和和解开, 用于点云前培训。 正如此名称所暗示的那样, 我们探索如何将原始点云与混合点云分开, 并将此具有挑战性的任务作为模型培训的借口优化目标。 考虑到原始数据集中的培训依赖性有限, 远低于当前图像网, 混合过程可以有效地生成更高质量的样本。 我们建立一个基线网络来核查我们的直觉, 仅仅包含两个模块, 编码和解码。 由于点云云云云云云云的云层, 编码器首先被训练用于提取语系嵌入嵌入。 然后, 将试调解码模型用于根据嵌入来解析点云层云层。 尽管简单, 混合过程过程能够从我们所训练的每部的云层精度中快速地捕捉取到我们所训练的云层和模型的精度的精度的精度的精度 。, 将数据的精度的精度的精度分析和深度的精度研究可以快速地展示, 。 将数据转换到演示的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度的精度, 。