Remarkable performance from Transformer networks in Natural Language Processing promote the development of these models in dealing with computer vision tasks such as image recognition and segmentation. In this paper, we introduce a novel framework, called Multi-level Multi-scale Point Transformer (MLMSPT) that works directly on the irregular point clouds for representation learning. Specifically, a point pyramid transformer is investigated to model features with diverse resolutions or scales we defined, followed by a multi-level transformer module to aggregate contextual information from different levels of each scale and enhance their interactions. While a multi-scale transformer module is designed to capture the dependencies among representations across different scales. Extensive evaluation on public benchmark datasets demonstrate the effectiveness and the competitive performance of our methods on 3D shape classification, segmentation tasks.
翻译:在本文中,我们引入了一个新的框架,称为多级多级多级点变换器(MLMSPT),直接在非定点云层上工作,用于代言学习,具体地说,一个点金字塔变压器被调查为以我们定义的不同分辨率或尺度为模型的特征,然后是多级变压器模块,以汇总每个尺度不同级别的背景信息,加强互动。虽然设计了一个多级变压器模块,以捕捉不同尺度代表之间的依赖性。对公共基准数据集的广泛评价显示了我们3D形状分类、分解任务的方法的有效性和竞争性表现。