The recently developed pure Transformer architectures have attained promising accuracy on point cloud learning benchmarks compared to convolutional neural networks. However, existing point cloud Transformers are computationally expensive since they waste a significant amount of time on structuring the irregular data. To solve this shortcoming, we present Sparse Window Attention (SWA) module to gather coarse-grained local features from non-empty voxels, which not only bypasses the expensive irregular data structuring and invalid empty voxel computation, but also obtains linear computational complexity with respect to voxel resolution. Meanwhile, to gather fine-grained features about the global shape, we introduce relative attention (RA) module, a more robust self-attention variant for rigid transformations of objects. Equipped with the SWA and RA, we construct our neural architecture called PVT that integrates both modules into a joint framework for point cloud learning. Compared with previous Transformer-based and attention-based models, our method attains top accuracy of 94.0% on classification benchmark and 10x inference speedup on average. Extensive experiments also valid the effectiveness of PVT on part and semantic segmentation benchmarks (86.6% and 69.2% mIoU, respectively).
翻译:最近开发的纯变换器结构在点云学习基准中与进化神经网络相比,在点云学习基准中实现了有希望的准确性。然而,现有的点云变换器在计算上成本很高,因为它们浪费了大量时间来构建非常规数据。为了解决这一缺陷,我们介绍了松散窗口注意(SWA)模块,以便从非空的氧化物中收集粗化的本地特性,这些模型不仅绕过昂贵的不规则数据结构和无效的空对氧化物计算,而且还在 voxel 解析方面获得了线性计算复杂性。与此同时,为了收集全球形状的精细度特征,我们引入了相对关注(RA)模块,这是更强有力的自我注意变异功能,用于僵硬的物体变换。我们用SWA和RA 来构建神经结构,将这两个模块整合成一个点云学习的联合框架。与先前的基于变压器和基于注意力的模型相比,我们的方法在分类基准和10x加速度方面达到了最高精确度的精确度。同时,我们引入了相对关注(RA)模块,我们引入了相对注意模块模块模块(RA),这是一个更强大的自我注意变量变异变异变量变异变量变异变量变体变体变体变体变体变体变体变体变体变体变体变体变体变体变体变体变体变体变体变体变体变体变体变体的模型, 和静的模型的模型的模型的模型, 86 。