The point cloud learning community is witnesses a modeling shift from CNNs to Transformers, where pure Transformer architectures have achieved top accuracy on the major learning benchmarks. However, existing point Transformers are computationally expensive since they need to generate a large attention map, which has quadratic complexity (both in space and time) with respect to input size. To solve this shortcoming, we introduce Patch attention (PAT) to adaptively learn a much smaller set of bases upon which the attention maps are computed. By a weighted summation upon these bases, PAT not only captures the global shape context but also achieves linear complexity to input size. In addition, we propose a lightweight Multi-scale attention (MST) block to build attentions among features of different scales, providing the model with multi-scale features. Equipped with the PAT and MST, we construct our neural architecture called PatchFormer that integrates both modules into a joint framework for point cloud learning. Extensive experiments demonstrate that our network achieves comparable accuracy on general point cloud learning tasks with 9.2x speed-up than previous point Transformers.
翻译:点云学习界见证了从CNN到变异器的模型转变, 纯变异器结构在主要学习基准上达到了最高精确度。 但是, 现有的点变异器在计算上成本很高, 因为它们需要生成一个大关注度地图, 在输入大小方面( 在空间和时间上) 具有二次复杂度。 为了解决这一缺陷, 我们引入 Patch 关注( PAT), 以适应方式学习一套小得多的基数, 用于计算注意地图。 在这些基数上进行加权加和, PAT 不仅捕捉全球形状环境, 而且还实现输入大小的线性复杂度。 此外, 我们提议一个轻度多尺度关注块, 以在不同尺度的特征中建立关注点, 提供多尺度的模型 。 我们用 PAT 和 MST 来构建我们称为 PatchFormer 的神经结构, 将两个模块整合成一个用于点云学习的联合框架 。 广泛实验显示, 我们的网络在一般点云学习任务上比前点变速9. 2 。