In skeleton-based action recognition, Graph Convolutional Networks model human skeletal joints as vertices and connect them through an adjacency matrix, which can be seen as a local attention mask. However, in most existing Graph Convolutional Networks, the local attention mask is defined based on natural connections of human skeleton joints and ignores the dynamic relations for example between head, hands and feet joints. In addition, the attention mechanism has been proven effective in Natural Language Processing and image description, which is rarely investigated in existing methods. In this work, we proposed a new adaptive spatial attention layer that extends local attention map to global based on relative distance and relative angle information. Moreover, we design a new initial graph adjacency matrix that connects head, hands and feet, which shows visible improvement in terms of action recognition accuracy. The proposed model is evaluated on two large-scale and challenging datasets in the field of human activities in daily life: NTU-RGB+D and Kinetics skeleton. The results demonstrate that our model has strong performance on both dataset.
翻译:在基于骨骼的行动识别中,图表革命网络模拟人类骨骼连接作为脊椎,并通过相邻矩阵将其连接起来,这可以被视为地方关注面罩;然而,在大多数现有的图表革命网络中,地方关注面罩是根据人类骨骼连接的自然联系来定义的,忽视了例如头、手和脚连接之间的动态关系;此外,关注机制已证明在自然语言处理和图像描述方面是有效的,而在现有方法中很少对此进行调查。在这项工作中,我们提出了一个新的适应性空间关注层,根据相对距离和相对角度信息将地方关注地图扩大到全球。此外,我们设计了一个新的初始图形相邻矩阵,将头、手和脚连接起来,显示行动识别准确性方面的明显改善。拟议模型用人类日常生活活动领域的两个大型和具有挑战性的数据集:NTU-RGB+D和Kinitics骨架进行评估。结果表明,我们的模型在两个数据集上都有很强的性能。