Pose based hand gesture recognition has been widely studied in the recent years. Compared with full body action recognition, hand gesture involves joints that are more spatially closely distributed with stronger collaboration. This nature requires a different approach from action recognition to capturing the complex spatial features. Many gesture categories, such as "Grab" and "Pinch", have very similar motion or temporal patterns posing a challenge on temporal processing. To address these challenges, this paper proposes a two-stream neural network with one stream being a self-attention based graph convolutional network (SAGCN) extracting the short-term temporal information and hierarchical spatial information, and the other being a residual-connection enhanced bidirectional Independently Recurrent Neural Network (RBi-IndRNN) for extracting long-term temporal information. The self-attention based graph convolutional network has a dynamic self-attention mechanism to adaptively exploit the relationships of all hand joints in addition to the fixed topology and local feature extraction in the GCN. On the other hand, the residual-connection enhanced Bi-IndRNN extends an IndRNN with the capability of bidirectional processing for temporal modelling. The two streams are fused together for recognition. The Dynamic Hand Gesture dataset and First-Person Hand Action dataset are used to validate its effectiveness, and our method achieves state-of-the-art performance.
翻译:近些年来,人们广泛研究了基于手势的手势识别方法。 与全身动作识别方法相比, 手势的手势涉及在空间上更加紧密分布的联结, 并且更紧密地合作。 这种性质要求从行动识别到捕捉复杂的空间特征的不同方法。 许多手势类别, 如“ Grab” 和“ Pinch ”, 有着非常相似的动作或时间模式, 给时间处理带来挑战。 为了应对这些挑战,本文件建议建立一个双流神经网络, 其中一个流是一个基于自我注意的图形相控网络( SAGCN ), 提取短期时间信息和等级空间信息, 另一端则是一种强化的双向连接, 强化的双向双向独立神经网络(RB- IndRNN), 以提取长期时间时间信息。 以自用图式革命网络为主的动态自控机制, 除了GCN 固定的表层和本地特征提取外, 剩余连接强化的Bi- IndRNNNN, 及其双向性数据处理工具, 用于双向同步数据验证。