Following the tremendous success of transformer in natural language processing and image understanding tasks, in this paper, we present a novel point cloud representation learning architecture, named Dual Transformer Network (DTNet), which mainly consists of Dual Point Cloud Transformer (DPCT) module. Specifically, by aggregating the well-designed point-wise and channel-wise multi-head self-attention models simultaneously, DPCT module can capture much richer contextual dependencies semantically from the perspective of position and channel. With the DPCT module as a fundamental component, we construct the DTNet for performing point cloud analysis in an end-to-end manner. Extensive quantitative and qualitative experiments on publicly available benchmarks demonstrate the effectiveness of our proposed transformer framework for the tasks of 3D point cloud classification and segmentation, achieving highly competitive performance in comparison with the state-of-the-art approaches.
翻译:继变压器在自然语言处理和图像理解任务方面取得巨大成功之后,本文件提出了一个新的点云代表学习结构,名为“双点变换器网络”,主要由两点云变换器模块组成,具体地说,通过同时汇总设计完善的点点和频道多头自省模型,DPCT模块能够从位置和渠道的角度,以单词方式捕捉更多更丰富的背景依赖性。用DPCT模块作为基本组成部分,我们建造DTNet,以便以端到端方式进行点云分析。关于可公开使用的基准的广泛定量和定性实验表明,我们提议的3D点云分级和分块任务变压器框架的有效性,与最新方法相比,取得了高度竞争性的业绩。