Skeleton data, which consists of only the 2D/3D coordinates of the human joints, has been widely studied for human action recognition. Existing methods take the semantics as prior knowledge to group human joints and draw correlations according to their spatial locations, which we call the semantic perspective for skeleton modeling. In this paper, in contrast to previous approaches, we propose to model skeletons from a novel spatial perspective, from which the model takes the spatial location as prior knowledge to group human joints and mines the discriminative patterns of local areas in a hierarchical manner. The two perspectives are orthogonal and complementary to each other; and by fusing them in a unified framework, our method achieves a more comprehensive understanding of the skeleton data. Besides, we customized two networks for the two perspectives. From the semantic perspective, we propose a Transformer-like network that is expert in modeling joint correlations, and present three effective techniques to adapt it for skeleton data. From the spatial perspective, we transform the skeleton data into the sparse format for efficient feature extraction and present two types of sparse convolutional networks for sparse skeleton modeling. Extensive experiments are conducted on three challenging datasets for skeleton-based human action/gesture recognition, namely, NTU-60, NTU-120 and SHREC, where our method achieves state-of-the-art performance.
翻译:仅由2D/3D人类联合体坐标构成的Skeleton数据,已经为人类行动识别进行了广泛研究。现有方法将语义学作为先前的知识,将人类联合体分组,并根据其空间位置根据相关关系绘制相关关系,我们称之为骨架建模的语义学观点。在本文中,我们建议从新颖的空间角度来模拟骨架,模型将空间位置作为以前对人类联合体的了解,以等级方式将当地地区的歧视性模式作为原始空间位置。两种观点是垂直的,相辅相成的;通过在统一的框架内使用这些观点,我们的方法能够更全面地理解骨架数据。此外,我们为这两种观点定制了两个网络。我们从语义学角度提出一个类似变异器的网络,在模拟联合关系方面具有专家,并提出了三种有效的技术来调整它用于骨架数据。从空间角度,我们将骨架数据转换成稀疏的形态提取格式,并提出了两种稀疏的革命性骨架网络,用于稀疏的骨架建模型;此外,我们为SHIR-NTUS-S-S-TU实验,在三种基于人类的状态上取得了具有挑战性的行动表现。SHIT-SHIT-SIT-SL-SL-T-SIT-SL-SL-T-SL-SL-S-S-T-SUT-S-SL-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-