FlatFormer: 高效点云变换器的点心窗口注意 (FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer)

Transformer, as an alternative to CNN, has been proven effective in many modalities (e.g., texts and images). For 3D point cloud transformers, existing efforts focus primarily on pushing their accuracy to the state-of-the-art level. However, their latency lags behind sparse convolution-based models (3x slower), hindering their usage in resource-constrained, latency-sensitive applications (such as autonomous driving). This inefficiency comes from point clouds' sparse and irregular nature, whereas transformers are designed for dense, regular workloads. This paper presents FlatFormer to close this latency gap by trading spatial proximity for better computational regularity. We first flatten the point cloud with window-based sorting and partition points into groups of equal sizes rather than windows of equal shapes. This effectively avoids expensive structuring and padding overheads. We then apply self-attention within groups to extract local features, alternate sorting axis to gather features from different directions, and shift windows to exchange features across groups. FlatFormer delivers state-of-the-art accuracy on Waymo Open Dataset with 4.6x speedup over (transformer-based) SST and 1.4x speedup over (sparse convolutional) CenterPoint. This is the first point cloud transformer that achieves real-time performance on edge GPUs and is faster than sparse convolutional methods while achieving on-par or even superior accuracy on large-scale benchmarks. Code to reproduce our results will be made publicly available.

翻译：替代CNN的变换器在很多模式(如文本和图像)中被证明是有效的。对于 3D 点云变压器来说, 现有的努力主要侧重于将其精确度提高到最先进的水平。但是, 它们的悬浮率落后于分散的基于卷变模型( 3x慢化), 从而阻碍了其在资源限制、延缓性敏感应用( 如自主驱动) 中的使用。这种低效率来自点云的稀疏和不规律性, 而变压器是设计为密集、正常工作量的。本文展示了 FlatFormer 来通过交换空间接近来缩小这一拉长差距, 以更精确的计算性能。我们首先用基于窗口的排序和分区的点将点云缩小到相同大小的组( 3x慢化 ), 这有效地避免了在资源限制、延缓度敏感度的应用程序( 如自主驾驶等) 。我们随后在组内使用自我保存来提取本地特性, 交式的轴以收集不同方向的特征, 并将窗口转换到各组间的交换功能。。本文展示 Flatformermeral 将显示在更接近的更接近的更接近的平级的平流的平流的平流结果上, 。在方向上, 将交付的平流的平流的平流的平流的平流的平流的平流速度将实现。

相关内容

点云

关注 48

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【图深度学习GDL论文大全】A comprehensive collection of recent papers on graph deep learning

专知会员服务

47+阅读 · 2019年12月1日