Recently action recognition has received more and more attention for its comprehensive and practical applications in intelligent surveillance and human-computer interaction. However, few-shot action recognition has not been well explored and remains challenging because of data scarcity. In this paper, we propose a novel hierarchical compositional representations (HCR) learning approach for few-shot action recognition. Specifically, we divide a complicated action into several sub-actions by carefully designed hierarchical clustering and further decompose the sub-actions into more fine-grained spatially attentional sub-actions (SAS-actions). Although there exist large differences between base classes and novel classes, they can share similar patterns in sub-actions or SAS-actions. Furthermore, we adopt the Earth Mover's Distance in the transportation problem to measure the similarity between video samples in terms of sub-action representations. It computes the optimal matching flows between sub-actions as distance metric, which is favorable for comparing fine-grained patterns. Extensive experiments show our method achieves the state-of-the-art results on HMDB51, UCF101 and Kinetics datasets.
翻译:最近的行动承认因其在智能监视和人-计算机互动方面的全面和实用应用而日益受到越来越多的关注,然而,由于数据稀缺,对少数点行动承认没有进行充分探讨,而且仍然具有挑战性;在本文件中,我们提出一种新的分级构成代表(HCR)学习方法,以小点行动承认。具体地说,我们通过精心设计的分级组合将一个复杂的行动分为若干次行动,并将分行动进一步分解为更细微的注意空间的子行动(SAS-Actions),尽管基础类和新类之间存在很大差异,但它们可以在分级或SAS行动中分享类似的模式。此外,我们采用了地球移动器在运输问题上的距离,以分级行动代表衡量视频样品之间的相似性。我们把分级行动之间的最佳匹配流量算为远程测量,这有利于比较细度模式。广泛的实验显示,我们的方法取得了HMDB51、UCF101和Kinitices数据集方面的最新结果。