We present SeRP, a framework for Self-Supervised Learning of 3D point clouds. SeRP consists of encoder-decoder architecture that takes perturbed or corrupted point clouds as inputs and aims to reconstruct the original point cloud without corruption. The encoder learns the high-level latent representations of the points clouds in a low-dimensional subspace and recovers the original structure. In this work, we have used Transformers and PointNet-based Autoencoders. The proposed framework also addresses some of the limitations of Transformers-based Masked Autoencoders which are prone to leakage of location information and uneven information density. We trained our models on the complete ShapeNet dataset and evaluated them on ModelNet40 as a downstream classification task. We have shown that the pretrained models achieved 0.5-1% higher classification accuracies than the networks trained from scratch. Furthermore, we also proposed VASP: Vector-Quantized Autoencoder for Self-supervised Representation Learning for Point Clouds that employs Vector-Quantization for discrete representation learning for Transformer-based autoencoders.
翻译:我们介绍了三维点云自我强化学习框架SERP。 SERP 包含以扰动或腐蚀点云作为投入的编码器解码器结构,目的是重建原始点云,而不会出现腐败。编码器学习低维子空间中点云的高潜层图,并恢复原始结构。在这项工作中,我们使用了变压器和基于点网的自动编码器。拟议框架还解决了基于变压器的蒙面自动编码器的某些局限性,这些变压器容易泄漏位置信息,信息密度不均。我们用完整的 ShapeNet 数据集对模型进行了培训,并将模型40 评估作为下游分类任务。我们已经表明,预培训模型的分类值比从零到零的网络高出了0.5-1个百分点。此外,我们还提议了VASP: 矢量自动自动编码,用于对点云进行自我控制的代表学习,该点云使用矢量-量定量,用于为基于变压的自动编码进行独立代表学习。