In recent years, graph convolutional networks (GCNs) play an increasingly critical role in skeleton-based human action recognition. However, most GCN-based methods still have two main limitations: 1) They only consider the motion information of the joints or process the joints and bones separately, which are unable to fully explore the latent functional correlation between joints and bones for action recognition. 2) Most of these works are performed in the supervised learning way, which heavily relies on massive labeled training data. To address these issues, we propose a semi-supervised skeleton-based action recognition method which has been rarely exploited before. We design a novel correlation-driven joint-bone fusion graph convolutional network (CD-JBF-GCN) as an encoder and use a pose prediction head as a decoder to achieve semi-supervised learning. Specifically, the CD-JBF-GC can explore the motion transmission between the joint stream and the bone stream, so that promoting both streams to learn more discriminative feature representations. The pose prediction based auto-encoder in the self-supervised training stage allows the network to learn motion representation from unlabeled data, which is essential for action recognition. Extensive experiments on two popular datasets, i.e. NTU-RGB+D and Kinetics-Skeleton, demonstrate that our model achieves the state-of-the-art performance for semi-supervised skeleton-based action recognition and is also useful for fully-supervised methods.
翻译:近些年来,图形连锁网络(GCNs)在基于骨骼的人类行动认知中发挥着越来越关键的作用。然而,大多数基于GCN的方法仍然有两个主要限制:(1) 它们只考虑联合或进程联合和骨骼的运动信息,无法完全探索联合和骨骼之间的潜在功能关系,以采取行动认知。(2) 大部分这些工作都是以监督的学习方式进行的,严重依赖大量标签的培训数据。为了解决这些问题,我们建议采用半监督的基于骨骼的行动识别方法,而这种方法以前很少被利用。 我们设计了一个新型的由相关驱动的联合骨骼组合图形组合网络(CD-JBF-GCN)作为编码器,并使用预测头作为实现半超导学习的导体。 具体而言,CD-JBF-GC可以探索联合流和骨骼流之间的运动传输方式,从而促进两种基于歧视的特征描述。 在自我监督的培训阶段,基于自动编码的预测,使得网络能够从未标定的底基数据中学习运动。