Skeleton-based action recognition has attracted research attentions in recent years. One common drawback in currently popular skeleton-based human action recognition methods is that the sparse skeleton information alone is not sufficient to fully characterize human motion. This limitation makes several existing methods incapable of correctly classifying action categories which exhibit only subtle motion differences. In this paper, we propose a novel framework for employing human pose skeleton and joint-centered light-weight information jointly in a two-stream graph convolutional network, namely, JOLO-GCN. Specifically, we use Joint-aligned optical Flow Patches (JFP) to capture the local subtle motion around each joint as the pivotal joint-centered visual information. Compared to the pure skeleton-based baseline, this hybrid scheme effectively boosts performance, while keeping the computational and memory overheads low. Experiments on the NTU RGB+D, NTU RGB+D 120, and the Kinetics-Skeleton dataset demonstrate clear accuracy improvements attained by the proposed method over the state-of-the-art skeleton-based methods.
翻译:近年来,基于Skeleton的行动识别吸引了研究关注。目前流行的基于骨骼的人类行动识别方法的一个常见缺陷是,光是稀少的骨骼信息不足以充分描述人类运动的特征。这一限制使得一些现有的方法无法正确分类行动类别,而这些行动类别只显示出微妙的动作差异。在本文中,我们提出了一个在双流图组合网络(即JOLO-GCN)中联合使用人造骨架和联合以轻量度信息的新框架。具体地说,我们使用联合光学流动补丁(JFP)捕捉每个联合联合的本地微妙动作作为关键的联合视觉信息。与纯粹基于骨骼的基线相比,这种混合计划有效地提升了绩效,同时保持计算和记忆的低位。对NTU RGB+D、NTU RGB+D 120和Kinetics-Skeleton数据集的实验表明,拟议方法对基于现状的骨骼方法取得了明确的准确性改进。