Skeleton-based action recognition methods are limited by the semantic extraction of spatio-temporal skeletal maps. However, current methods have difficulty in effectively combining features from both temporal and spatial graph dimensions and tend to be thick on one side and thin on the other. In this paper, we propose a Temporal-Channel Aggregation Graph Convolutional Networks (TCA-GCN) to learn spatial and temporal topologies dynamically and efficiently aggregate topological features in different temporal and channel dimensions for skeleton-based action recognition. We use the Temporal Aggregation module to learn temporal dimensional features and the Channel Aggregation module to efficiently combine spatial dynamic topological features learned using Channel-wise with temporal dynamic topological features. In addition, we extract multi-scale skeletal features on temporal modeling and fuse them with priori skeletal knowledge with an attention mechanism. Extensive experiments show that our model results outperform state-of-the-art methods on the NTU RGB+D, NTU RGB+D 120, and NW-UCLA datasets.
翻译:在本文中,我们提议采用基于时空的瞬时骨骼图的语系提取方法,限制了基于皮肤的动作识别方法,但是,目前的方法难以有效地将时间和空间图形层面的特征结合起来,往往一面厚,另一面薄。我们提议采用时空-气道集合图变异网络(TTCA-GCN),以动态和高效率地学习时空表层和通道层面不同时间和通道层面的表层特征,以便进行基于骨骼的识别。我们使用时空聚合模块学习时空的特征和频道聚合模块,以便有效地将利用海峡和时间动态表层特征学习的空间动态表层特征结合起来。此外,我们还在时间模型中提取多尺度的骨骼特征,并将这些特征与前骨骼知识结合,并有一个关注机制。广泛的实验显示,我们的模型结果超越了NTU RGB+D、NTU RGB+D 120和NW-UCLA数据集的状态方法。