In this paper, we present Position-to-Structure Attention Transformers (PS-Former), a Transformer-based algorithm for 3D point cloud recognition. PS-Former deals with the challenge in 3D point cloud representation where points are not positioned in a fixed grid structure and have limited feature description (only 3D coordinates ($x, y, z$) for scattered points). Existing Transformer-based architectures in this domain often require a pre-specified feature engineering step to extract point features. Here, we introduce two new aspects in PS-Former: 1) a learnable condensation layer that performs point downsampling and feature extraction; and 2) a Position-to-Structure Attention mechanism that recursively enriches the structural information with the position attention branch. Compared with the competing methods, while being generic with less heuristics feature designs, PS-Former demonstrates competitive experimental results on three 3D point cloud tasks including classification, part segmentation, and scene segmentation.
翻译:在本文中,我们介绍基于位置到结构的注意变换器(PS-Former),即基于3D点云识别的基于变异器的3D点云的算法。 PS-Former处理3D点云的表示式的挑战,其中点没有定位在一个固定的网格结构中,但特征描述有限(对于分散点,只有 3D 坐标($, y, z美元) )。这个域内现有的基于变异器的建筑往往需要一个预先指定的工程步骤来提取点特征特征特征。在这里,我们在 PS-Former 中引入了两个新的方面:1) 一个可以学习的浓缩层,进行点下取样和特征提取;和2) 一个位置到结构的注意机制,它反复地丰富了与位置注意分支的结构信息。与相竞争的方法相比,PS-Former在三个3D点云任务上展示了竞争性的实验结果,包括分类、部分分割和场景分分割。