Skeleton-based action recognition is an important task that requires the adequate understanding of movement characteristics of a human action from the given skeleton sequence. Recent studies have shown that exploring spatial and temporal features of the skeleton sequence is vital for this task. Nevertheless, how to effectively extract discriminative spatial and temporal features is still a challenging problem. In this paper, we propose a novel Attention Enhanced Graph Convolutional LSTM Network (AGC-LSTM) for human action recognition from skeleton data. The proposed AGC-LSTM can not only capture discriminative features in spatial configuration and temporal dynamics but also explore the co-occurrence relationship between spatial and temporal domains. We also present a temporal hierarchical architecture to increases temporal receptive fields of the top AGC-LSTM layer, which boosts the ability to learn the high-level semantic representation and significantly reduces the computation cost. Furthermore, to select discriminative spatial information, the attention mechanism is employed to enhance information of key joints in each AGC-LSTM layer. Experimental results on two datasets are provided: NTU RGB+D dataset and Northwestern-UCLA dataset. The comparison results demonstrate the effectiveness of our approach and show that our approach outperforms the state-of-the-art methods on both datasets.
翻译:最近的研究表明,探索骨骼序列的时空特征对于这项任务至关重要。然而,如何有效提取具有歧视性的时空特征仍是一个具有挑战性的问题。在本文件中,我们提议建立一个新的“关注增强动动动图LSTM网络”(AGC-LSTM),以便从骨骼数据中识别人类行动。拟议的AGC-LSTM不仅能够捕捉空间配置和时间动态中的区别性特征,还可以探索空间和时空领域之间的共生关系。我们还提出了一个时间等级结构,以增加AGC-LSTM顶层的可接受时间字段,这提高了学习高层次语义代表的能力,并大大降低了计算成本。此外,为了选择具有歧视性的空间信息,采用了关注机制来增强AGC-LSTM层中关键联合的信息。提供了两个数据集的实验结果:NTU RGB+D数据集和西北-UCLA数据集。比较结果显示我们数据配置方法的有效性,显示我们数据配置方法的状态。