Graph convolutional networks (GCNs) are the most commonly used method for skeleton-based action recognition and have achieved remarkable performance. Generating adjacency matrices with semantically meaningful edges is particularly important for this task, but extracting such edges is challenging problem. To solve this, we propose a hierarchically decomposed graph convolutional network (HD-GCN) architecture with a novel hierarchically decomposed graph (HD-Graph). The proposed HD-GCN effectively decomposes every joint node into several sets to extract major adjacent and distant edges, and uses them to construct an HD-Graph containing those edges in the same semantic spaces of a human skeleton. In addition, we introduce an attention-guided hierarchy aggregation (A-HA) module to highlight the dominant hierarchical edge sets of the HD-Graph. Furthermore, we apply a new two-stream-three-graph ensemble method, which uses only joint and bone stream without any motion stream. The proposed model is evaluated and achieves state-of-the-art performance on three large, popular datasets: NTU-RGB+D 60, NTU-RGB+D 120, and Northwestern-UCLA. Finally, we demonstrate the effectiveness of our model with various comparative experiments.
翻译:为了解决这个问题,我们提议了一个等级分解的图形相联网络(HD-GCN)结构(HD-GCN)结构,其分等级分解图(HD-graph)具有新颖的分级图(HD-graph),拟议的HD-GCN有效地将每个联合节点分解成几组,以提取邻近和远处的主要边缘,并利用它们建立一个HD-Graph,在人类骨骼的同一语义空间中包含这些边缘。此外,我们引入了一个引人注意的等级汇总(A-HA)模块,以突出HD-Graph的主导等级边缘。此外,我们采用了一个新的双流-三色调的共振图解方法,该方法仅使用联合和骨流,而没有任何运动流。 拟议的模型经过评估,并在三个大型、流行的、流行的、新型的、新型的、新型的、NGRB+的、具有我们120个西方的、NGRG-TU-TU-TU-TLLA、N-G-TU-TU-TU-TU-TU-LILU-LU-LU-LA。