Due to the sparsity and irregularity of the 3D data, approaches that directly process points have become popular. Among all point-based models, Transformer-based models have achieved state-of-the-art performance by fully preserving point interrelation. However, most of them spend high percentage of total time on sparse data accessing (e.g., Farthest Point Sampling (FPS) and neighbor points query), which becomes the computation burden. Therefore, we present a novel 3D Transformer, called Point-Voxel Transformer (PVT) that leverages self-attention computation in points to gather global context features, while performing multi-head self-attention (MSA) computation in voxels to capture local information and reduce the irregular data access. Additionally, to further reduce the cost of MSA computation, we design a cyclic shifted boxing scheme which brings greater efficiency by limiting the MSA computation to non-overlapping local boxes while also preserving cross-box connection. Our method fully exploits the potentials of Transformer architecture, paving the road to efficient and accurate recognition results. Evaluated on classification and segmentation benchmarks, our PVT not only achieves strong accuracy but outperforms previous state-of-the-art Transformer-based models with 9x measured speedup on average. For 3D object detection task, we replace the primitives in Frustrum PointNet with PVT layer and achieve the improvement of 8.6%.
翻译:由于3D数据的广度和不规则性,直接处理点的做法变得十分流行。在所有点基模型中,以变换器为基础的模型通过充分保存点相互关系实现了最先进的性能。然而,大多数以变换器为基础的模型通过充分保存点相互关系实现了最先进的性能。但是,它们将总时间的很大比例花在了稀少的数据访问上(例如,远方点抽样抽样和相邻点查询),这成了计算负担。因此,我们提出了一个新型的3D变换器,称为点-Voxel变换器(PVT),它利用在点上的自我注意计算来收集全球背景特征,同时用 voxels 进行多头自我注意(MSA) 计算,以捕捉当地信息并减少不正常的数据访问。此外,为了进一步降低调控协议的计算成本,我们设计了一个环球转换框,通过限制对不重叠的地方箱进行计算,同时保留交叉箱连接。我们的方法充分利用了变换器结构的潜力,为高效和准确的识别结果铺平道路。评估了多头目标和分解速度基准,我们用了分解的变换了先前的变式变式标准,我们用了前的变换式模型,没有达到前的精确的精确型模型,我们之前的变换式的精确度。我们用了以前的变换式的变换式的变换式的变式格式,我们的变式的变式的变式的变式的变式的变式的变式模型,我们的变式的变式格式,没有达到了前的变式标准。我们的变式的变式的变式的变式的变式的变换式的变式的变式的变式格式,我们的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式,没有了前的变式,没有了前的变式的变式,我们的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式,没有了前的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式,我们的变式的变式