Due to the compact and rich high-level representations offered, skeleton-based human action recognition has recently become a highly active research topic. Previous studies have demonstrated that investigating joint relationships in spatial and temporal dimensions provides effective information critical to action recognition. However, effectively encoding global dependencies of joints during spatio-temporal feature extraction is still challenging. In this paper, we introduce Action Capsule which identifies action-related key joints by considering the latent correlation of joints in a skeleton sequence. We show that, during inference, our end-to-end network pays attention to a set of joints specific to each action, whose encoded spatio-temporal features are aggregated to recognize the action. Additionally, the use of multiple stages of action capsules enhances the ability of the network to classify similar actions. Consequently, our network outperforms the state-of-the-art approaches on the N-UCLA dataset and obtains competitive results on the NTURGBD dataset. This is while our approach has significantly lower computational requirements based on GFLOPs measurements.
翻译:以往的研究显示,对空间和时间层面联合关系的调查提供了对行动识别至关重要的有效信息;然而,在时空特征提取过程中,对联合的全球依赖性进行有效编码仍然具有挑战性。在本文件中,我们引入了“行动库”,通过考虑连接在骨骼序列中的潜在相关性,确定与行动有关的关键连接。我们显示,在推断期间,我们的端对端网络关注每个行动的具体联合,其编码为时空特征汇总,以确认行动。此外,使用多个阶段的行动舱提高了网络对类似行动进行分类的能力。因此,我们的网络超越了N-ULAC数据集的最新方法,在NTURGBD数据集上获得了竞争性结果。我们的方法大大降低了基于GFLOP的测量的计算要求。