Transformer-based Self-supervised Representation Learning methods learn generic features from unlabeled datasets for providing useful network initialization parameters for downstream tasks. Recently, self-supervised learning based upon masking local surface patches for 3D point cloud data has been under-explored. In this paper, we propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D), a novel autoencoding paradigm for self-supervised learning. We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches. Secondly, we employ patch-wise MAE3D Transformers to learn both local features of point cloud patches and high-level contextual relationships between patches and complete the latent representations of masked patches. We use our Point Cloud Reconstruction Module with multi-task loss to complete the incomplete point cloud as a result. We conduct self-supervised pre-training on ShapeNet55 with the point cloud completion pre-text task and fine-tune the pre-trained model on ModelNet40 and ScanObjectNN (PB\_T50\_RS, the hardest variant). Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks, soundly outperforming state-of-the-art methods ($93.4\%$ and $86.2\%$ classification accuracy, respectively).
翻译:以 3D 点云数据为掩码的地方表面补丁的自我监督学习方式 。 最近, 基于 3D 点云数据 3D 点云数据掩码的本地表面补丁进行自我监督学习 。 在本文中, 我们提出在 3D 点云代表学习( 以 MAE3D 为缩写) 中隐藏自动编码器。 这是用于自我监督学习的新颖的自动编码模式 。 我们首先将输入点云分成3.4 块, 并遮盖其中的一部分, 然后使用我们的补丁嵌嵌入模块来提取未包装的网络的准确度 。 其次, 我们使用补丁 MAE3D 变换工具来学习点云和高背景关系的地方特性, 完成掩码补丁的隐含性表示 。 我们使用带有多功能损失的点云重建模块来完成不完整的云。 我们在 Shape Net55 上进行自我监督的预培训, 以点云端完成前任务和精细的精度 IME3 模型前的模型和最精度变式模型演示。