三维点云序列的空间时空变换器 (Spatial-Temporal Transformer for 3D Point Cloud Sequences)

Effective learning of spatial-temporal information within a point cloud sequence is highly important for many down-stream tasks such as 4D semantic segmentation and 3D action recognition. In this paper, we propose a novel framework named Point Spatial-Temporal Transformer (PST2) to learn spatial-temporal representations from dynamic 3D point cloud sequences. Our PST2 consists of two major modules: a Spatio-Temporal Self-Attention (STSA) module and a Resolution Embedding (RE) module. Our STSA module is introduced to capture the spatial-temporal context information across adjacent frames, while the RE module is proposed to aggregate features across neighbors to enhance the resolution of feature maps. We test the effectiveness our PST2 with two different tasks on point cloud sequences, i.e., 4D semantic segmentation and 3D action recognition. Extensive experiments on three benchmarks show that our PST2 outperforms existing methods on all datasets. The effectiveness of our STSA and RE modules have also been justified with ablation experiments.

翻译：在点云序列中有效学习时空信息对于许多下游任务,例如4D语义分解和3D动作识别等,非常重要。在本文件中,我们提议了一个名为Point空间时空变换器(PST2)的新框架,以学习动态3D点云序列的空间时空表达方式。我们的PST2由两个主要模块组成:SPA-时空自我意识模块和分辨率嵌入模块。我们采用STSA模块是为了捕捉相邻框架的空间时空背景信息,而RE模块则建议将周边的特征汇总起来,以加强地貌图的分辨率。我们测试我们的PST2的有效性,在点云序列上执行两项不同的任务,即4D语义分解和3D动作识别。关于三个基准的广泛实验表明,我们的PST2超越了所有数据集的现有方法。我们的STA和RE模块的有效性也与通缩缩实验是有道理的。

相关内容

点云

关注 48

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

SIGIR2021接受论文列表公布！151篇论文都在这了！

专知会员服务

38+阅读 · 2021年4月27日

【CVPR2021】胶囊网络并不比卷积网络更鲁棒

专知会员服务

21+阅读 · 2021年4月1日

【KDD2020】图神经网络的无冗余计算

专知会员服务

38+阅读 · 2020年11月24日

【EMNLP2020-CMU&字节跳动】基于预训练语言模型的句子嵌入研究

专知会员服务

23+阅读 · 2020年11月14日