Graph convolutional networks (GCNs) have emerged as dominant methods for skeleton-based action recognition. However, they still suffer from two problems, namely, neighborhood constraints and entangled spatiotemporal feature representations. Most studies have focused on improving the design of graph topology to solve the first problem but they have yet to fully explore the latter. In this work, we design a disentangled spatiotemporal transformer (DSTT) block to overcome the above limitations of GCNs in three steps: (i) feature disentanglement for spatiotemporal decomposition;(ii) global spatiotemporal attention for capturing correlations in the global context; and (iii) local information enhancement for utilizing more local information. Thereon, we propose a novel architecture, named Hierarchical Graph Convolutional skeleton Transformer (HGCT), to employ the complementary advantages of GCN (i.e., local topology, temporal dynamics and hierarchy) and Transformer (i.e., global context and dynamic attention). HGCT is lightweight and computationally efficient. Quantitative analysis demonstrates the superiority and good interpretability of HGCT.
翻译:273. 大部分研究侧重于改进图表地形学的设计以解决第一个问题,但尚未充分探索第一个问题。在这项工作中,我们设计了一个分解的随机变压器(DSTT)块,以在三个步骤中克服GCN的上述局限性:(一) 特征分解,以便进行时空分解;(二) 全球空间关注,以捕捉全球环境的相互关系;(三) 本地信息,以利用更多的当地信息。我们为此建议了一个叫作高层次的图象骨骼变形器(HGCT)的新结构,以利用GCN的互补优势(即,当地地形学、时间动态和等级)和变形器(即,全球背景和动态关注)。HGCT是轻度和计算效率的。Q定量分析表明HGC的优越性和良好解释性。