Self-attention networks have revolutionized natural language processing and are making impressive strides in image analysis tasks such as image classification and object detection. Inspired by this success, we investigate the application of self-attention networks to 3D point cloud processing. We design self-attention layers for point clouds and use these to construct self-attention networks for tasks such as semantic scene segmentation, object part segmentation, and object classification. Our Point Transformer design improves upon prior work across domains and tasks. For example, on the challenging S3DIS dataset for large-scale semantic scene segmentation, the Point Transformer attains an mIoU of 70.4% on Area 5, outperforming the strongest prior model by 3.3 absolute percentage points and crossing the 70% mIoU threshold for the first time.
翻译:自留网络已使自然语言处理发生革命,并在图像分类和物体探测等图像分析任务方面取得令人印象深刻的进展。受这一成功启发,我们调查了将自留网络应用于3D点云处理的情况。我们设计了点云自留层,并用这些来建立自留网络,用于诸如语义区段、物体部分分割和物体分类等任务。我们的点变换器设计改进了以往跨领域和任务的工作。例如,在具有挑战性的用于大规模语义区段的S3DIS数据集方面,点变换器在5区达到70.4%的MIOU,比前最强的模型高出3.3个绝对百分点,首次越过70%的MIOU临界点。