Point cloud sequences are irregular and unordered in the spatial dimension while exhibiting regularities and order in the temporal dimension. Therefore, existing grid based convolutions for conventional video processing cannot be directly applied to spatio-temporal modeling of raw point cloud sequences. In this paper, we propose a point spatio-temporal (PST) convolution to achieve informative representations of point cloud sequences. The proposed PST convolution first disentangles space and time in point cloud sequences. Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension. Furthermore, we incorporate the proposed PST convolution into a deep network, namely PSTNet, to extract features of point cloud sequences in a hierarchical manner. Extensive experiments on widely-used 3D action recognition and 4D semantic segmentation datasets demonstrate the effectiveness of PSTNet to model point cloud sequences.
翻译:点云序列在空间层面不规则且没有顺序,同时显示时间层面的规律和顺序。 因此, 常规视频处理的现有基于网格的变异不能直接应用于原始点云序列的时空模型。 在本文中, 我们提出一个点云- 时空变异( PST), 以获取点云序列的信息。 拟议的 PST 组合首先在点云序列中分解空间和时间。 然后, 使用空间变异来捕捉 3D 空间点的本地结构, 并使用时间变异来模拟时空空间区域的动态。 此外, 我们将拟议的PST 变异纳入一个深网络, 即 PSTNet, 以分级方式提取点云序列的特征。 关于广泛使用的 3D 动作识别和 4D 语系分数据集的广泛实验, 证明了 PSTNet 模拟点云序列的有效性 。