Graph convolutional networks (GCNs) are the most commonly used methods for skeleton-based action recognition and have achieved remarkable performance. Generating adjacency matrices with semantically meaningful edges is particularly important for this task, but extracting such edges is challenging problem. To solve this, we propose a hierarchically decomposed graph convolutional network (HD-GCN) architecture with a novel hierarchically decomposed graph (HD-Graph). The proposed HD-GCN effectively decomposes every joint node into several sets to extract major structurally adjacent and distant edges, and uses them to construct an HD-Graph containing those edges in the same semantic spaces of a human skeleton. In addition, we introduce an attention-guided hierarchy aggregation (A-HA) module to highlight the dominant hierarchical edge sets of the HD-Graph. Furthermore, we apply a new six-way ensemble method, which uses only joint and bone stream without any motion stream. The proposed model is evaluated and achieves state-of-the-art performance on three large, popular datasets: NTU-RGB+D 60, NTU-RGB+D 120, and Northwestern-UCLA. Finally, we demonstrate the effectiveness of our model with various comparative experiments.
翻译:为了解决这个问题,我们提议了一个分级的图形共变网络(HD-GCN)架构,其结构分解图(HD-GCN)新颖的分级图(HD-graph) 。提议的HD-GCN有效地将每个联合节点分解成几组,以提取主要结构相邻和遥远边缘,并利用这些节点构建一个HD-Graph,在人类骨骼的同一语义空间中包含这些边缘。此外,我们引入了一个引人注意的分级结构汇总(A-HA)模块,以突出HD-Graph的主导等级边缘。此外,我们采用了一种新的六道混合方法,仅使用联合和骨流,而没有任何运动流。 拟议的模型在三个大型、流行的人类骨架空间中进行评估和实现状态性表现:NTHG-RB-RG-D。最后,我们用120号、NTU-RG-RG-B 和我们120号的西北比较性实验,我们用NTU-RG-RG-D。