The recent success of neural networks enables a better interpretation of 3D point clouds, but processing a large-scale 3D scene remains a challenging problem. Most current approaches divide a large-scale scene into small regions and combine the local predictions together. However, this scheme inevitably involves additional stages for pre- and post-processing and may also degrade the final output due to predictions in a local perspective. This paper introduces Fast Point Transformer that consists of a new lightweight self-attention layer. Our approach encodes continuous 3D coordinates, and the voxel hashing-based architecture boosts computational efficiency. The proposed method is demonstrated with 3D semantic segmentation and 3D detection. The accuracy of our approach is competitive to the best voxel-based method, and our network achieves 136 times faster inference time than the state-of-the-art, Point Transformer, with a reasonable accuracy trade-off.
翻译:神经网络最近的成功可以更好地解释3D点云,但处理大型的3D场景仍是一个具有挑战性的问题。当前大多数方法将大型场景分成小区域,并将当地预测结合起来。然而,这个办法不可避免地涉及处理前和处理后的更多阶段,并且可能由于从当地角度的预测而降低最终产出。本文介绍了由新的轻量自留层组成的快点变异器。我们的方法编码为连续3D座坐标,以及基于 voxel hashing 的建筑提高了计算效率。拟议方法以3D 语义分解和 3D 探测来演示。我们的方法的准确性是具有竞争力的基于最佳 voxel 的方法,而我们的网络在合理的精确交易中比尖端变速136倍。