Transformer-based networks have achieved impressive performance in 3D point cloud understanding. However, most of them concentrate on aggregating local features, but neglect to directly model global dependencies, which results in a limited effective receptive field. Besides, how to effectively incorporate local and global components also remains challenging. To tackle these problems, we propose Asymmetric Parallel Point Transformer (APPT). Specifically, we introduce Global Pivot Attention to extract global features and enlarge the effective receptive field. Moreover, we design the Asymmetric Parallel structure to effectively integrate local and global information. Combined with these designs, APPT is able to capture features globally throughout the entire network while focusing on local-detailed features. Extensive experiments show that our method outperforms the priors and achieves state-of-the-art on several benchmarks for 3D point cloud understanding, such as 3D semantic segmentation on S3DIS, 3D shape classification on ModelNet40, and 3D part segmentation on ShapeNet.
翻译:----
Transformer-based网络在3D点云理解方面取得了令人瞩目的性能。然而,它们大多集中于聚合局部特征,但忽略了直接建模全局依赖性,从而导致有效接受领域有限。此外,如何有效地融合局部和全局组件也仍然具有挑战性。为了解决这些问题,我们提出了基于异构并行GPU加速的点云理解模型(APPT)。具体而言,我们引入全局枢轴注意机制提取全局特征并扩大有效接受域。此外,我们设计了异构并行结构来有效地整合局部和全局信息。结合这些设计,APPT能够在整个网络中全局地捕捉特征,同时专注于局部详细特征。广泛的实验表明,我们的方法优于现有方法,并在S3DIS上的3D语义分割、ModelNet40上的3D形状分类和ShapeNet上的3D部分分割等多个3D点云理解基准上实现了最先进的性能。