In this paper we present SA-CNN, a hierarchical and lightweight self-attention based encoding and decoding architecture for representation learning of point cloud data. The proposed SA-CNN introduces convolution and transposed convolution stacks to capture and generate contextual information among unordered 3D points. Following conventional hierarchical pipeline, the encoding process extracts feature in local-to-global manner, while the decoding process generates feature and point cloud in coarse-to-fine, multi-resolution stages. We demonstrate that SA-CNN is capable of a wide range of applications, namely classification, part segmentation, reconstruction, shape retrieval, and unsupervised classification. While achieving the state-of-the-art or comparable performance in the benchmarks, SA-CNN maintains its model complexity several order of magnitude lower than the others. In term of qualitative results, we visualize the multi-stage point cloud reconstructions and latent walks on rigid objects as well as deformable non-rigid human and robot models.
翻译:在本文中,我们介绍了一个基于分级和轻量级的自我注意编码和解码结构,用以代表点云数据的学习。拟议的SA-CNN引入了变化和转换的变换堆,以便在没有顺序的3D点中捕捉和生成背景信息。在传统的分级管道之后,编码过程提取以地方到全球的方式呈现特征,而解码过程则在粗到直线、多分辨率的阶段产生特征和点云。我们证明SA-CNN能够进行广泛的应用,即分类、部分分解、重建、形状检索和不受监督的分类。SA-CNN在达到基准中最先进或可比的性能的同时,保持其模型复杂性的若干数量级比其他的要低。在质量结果方面,我们设想了多级点云的重建以及僵硬物体上的潜在行走道,以及可变式的非硬形人类和机器人模型。