多尺度几何感知变压器用于3D点云分类 (Multi-scale Geometry-aware Transformer for 3D Point Cloud Classification)

Self-attention modules have demonstrated remarkable capabilities in capturing long-range relationships and improving the performance of point cloud tasks. However, point cloud objects are typically characterized by complex, disordered, and non-Euclidean spatial structures with multiple scales, and their behavior is often dynamic and unpredictable. The current self-attention modules mostly rely on dot product multiplication and dimension alignment among query-key-value features, which cannot adequately capture the multi-scale non-Euclidean structures of point cloud objects. To address these problems, this paper proposes a self-attention plug-in module with its variants, Multi-scale Geometry-aware Transformer (MGT). MGT processes point cloud data with multi-scale local and global geometric information in the following three aspects. At first, the MGT divides point cloud data into patches with multiple scales. Secondly, a local feature extractor based on sphere mapping is proposed to explore the geometry inner each patch and generate a fixed-length representation for each patch. Thirdly, the fixed-length representations are fed into a novel geodesic-based self-attention to capture the global non-Euclidean geometry between patches. Finally, all the modules are integrated into the framework of MGT with an end-to-end training scheme. Experimental results demonstrate that the MGT vastly increases the capability of capturing multi-scale geometry using the self-attention mechanism and achieves strong competitive performance on mainstream point cloud benchmarks.

翻译：自注意力模块已经展示了捕捉长程关系和提高点云任务性能的卓越能力。然而，点云对象通常由复杂的、无序的、非欧几里德的空间结构描述，具有多个尺度，其行为常常是动态且不可预测的。当前的自注意力模块主要依赖于查询-键-值特征之间的点积乘法和维度对齐，这不能充分捕捉点云对象的多尺度非欧几里德结构。为解决这些问题，本文提出了一个自注意力插件模块及其变体，名为多尺度几何感知变压器（Multi-scale Geometry-aware Transformer，简称MGT）。MGT以以下三个方面处理点云数据的多尺度局部和全局几何信息。首先，MGT将点云数据分成具有多个尺度的补丁。其次，我们提出了一种基于球面映射的局部特征提取器，用于探索每个补丁内部的几何性质，并为每个补丁生成固定长度表示。第三，固定长度表示被馈送到新颖的基于测地线的自注意力中，以捕捉补丁之间的全局非欧几里德几何形状。最后，所有模块都集成在MGT的框架中，并采用端到端的训练方案。实验结果表明，MGT极大地增强了使用自注意机制捕捉多尺度几何形状的能力，并在主流点云基准上实现了强大的竞争性能。