Recently, there have been some attempts of Transformer in 3D point cloud classification. In order to reduce computations, most existing methods focus on local spatial attention, but ignore their content and fail to establish relationships between distant but relevant points. To overcome the limitation of local spatial attention, we propose a point content-based Transformer architecture, called PointConT for short. It exploits the locality of points in the feature space (content-based), which clusters the sampled points with similar features into the same class and computes the self-attention within each class, thus enabling an effective trade-off between capturing long-range dependencies and computational complexity. We further introduce an Inception feature aggregator for point cloud classification, which uses parallel structures to aggregate high-frequency and low-frequency information in each branch separately. Extensive experiments show that our PointConT model achieves a remarkable performance on point cloud shape classification. Especially, our method exhibits 90.3% Top-1 accuracy on the hardest setting of ScanObjectNN. Source code of this paper is available at https://github.com/yahuiliu99/PointConT.
翻译:最近,在3D点云分类中出现了一些变异器尝试。为了减少计算,大多数现有方法侧重于地方空间关注,但忽视其内容,未能在遥远但相关的点之间建立关系。为了克服当地空间关注的局限性,我们提议了一个基于点内容的变异器结构,简称PointConT,简称PointConT。它利用了地物空间(以调频为基础的)中点的位置,将具有类似特征的抽样点集中到同一个类别中,计算出每类内的自我关注,从而在捕获远程依赖性和计算复杂性之间实现有效的权衡。我们进一步引入了点云分类的感知特征聚合器,在每一个分支中分别使用平行结构来汇总高频和低频信息。广泛的实验显示,我们的点感应器模型在点云形状分类上取得了显著的性能。特别是,我们的方法在最困难的ScanObjectN中展示了90.3%的顶端1精确度。本文的源代码可在https://github.com/yahuili99/PointCONT上查阅。</s>