In this paper, we study the problem of one-shot skeleton-based action recognition, which poses unique challenges in learning transferable representation from base classes to novel classes, particularly for fine-grained actions. Existing meta-learning frameworks typically rely on the body-level representations in spatial dimension, which limits the generalisation to capture subtle visual differences in the fine-grained label space. To overcome the above limitation, we propose a part-aware prototypical representation for one-shot skeleton-based action recognition. Our method captures skeleton motion patterns at two distinctive spatial levels, one for global contexts among all body joints, referred to as body level, and the other attends to local spatial regions of body parts, referred to as the part level. We also devise a class-agnostic attention mechanism to highlight important parts for each action class. Specifically, we develop a part-aware prototypical graph network consisting of three modules: a cascaded embedding module for our dual-level modelling, an attention-based part fusion module to fuse parts and generate part-aware prototypes, and a matching module to perform classification with the part-aware representations. We demonstrate the effectiveness of our method on two public skeleton-based action recognition datasets: NTU RGB+D 120 and NW-UCLA.
翻译:在本文中,我们研究了一次性骨骼行动识别问题,这在学习从基级到新类的可转移代表制,特别是微小行动方面提出了独特的挑战。现有的元学习框架通常依赖空间层面的体级代表制,这限制了一般化,以捕捉细微标签空间的微妙视觉差异。为了克服上述限制,我们建议了一次性骨骼行动识别的半认知型代表制。我们的方法在两个不同的空间层面捕捉了骨骼运动模式,一个是所有机构联合体(称为机构级)的全球背景,另一个是机构部分的局部空间区域(称为部分级别)。我们还设计了一个阶级意识关注机制,以突出每个行动类别的重要部分。具体地说,我们开发了一个半认知的原型图形网络,由三个模块组成:一个双向模型的嵌入模块,一个以关注为基础的部分组合模块,将部分聚合成部分原型,以及一个匹配模块,以进行与部分质量代表制(称为 " NGB+DU " ) 的公众数据识别方法。我们展示了我们120个标准中的数据识别方法的有效性。