短片行动识别空间时际关系模型 (Spatio-temporal Relation Modeling for Few-shot Action Recognition)

We propose a novel few-shot action recognition framework, STRM, which enhances class-specific feature discriminability while simultaneously learning higher-order temporal representations. The focus of our approach is a novel spatio-temporal enrichment module that aggregates spatial and temporal contexts with dedicated local patch-level and global frame-level feature enrichment sub-modules. Local patch-level enrichment captures the appearance-based characteristics of actions. On the other hand, global frame-level enrichment explicitly encodes the broad temporal context, thereby capturing the relevant object features over time. The resulting spatio-temporally enriched representations are then utilized to learn the relational matching between query and support action sub-sequences. We further introduce a query-class similarity classifier on the patch-level enriched features to enhance class-specific feature discriminability by reinforcing the feature learning at different stages in the proposed framework. Experiments are performed on four few-shot action recognition benchmarks: Kinetics, SSv2, HMDB51 and UCF101. Our extensive ablation study reveals the benefits of the proposed contributions. Furthermore, our approach sets a new state-of-the-art on all four benchmarks. On the challenging SSv2 benchmark, our approach achieves an absolute gain of 3.5% in classification accuracy, as compared to the best existing method in the literature. Our code and models will be publicly released.

翻译：我们提议了一个新的微小行动识别框架,即STRM, 该框架在同时学习更高层次的时间代表制的同时,增强特定阶级特征的差别性特征。我们的方法重点是一个新的时空浓缩模块,该模块将空间和时空的浓缩环境与专门的局部和全球框架级特征浓缩子模块结合起来。地方补丁级浓缩反映了基于外观的行动特征。另一方面,全球框架级浓缩明确编码了广泛的时间背景,从而随着时间的推移捕捉了相关对象特征。随后,利用由此而形成的时空浓缩表层来学习查询与支持行动次序列之间的关系匹配。我们进一步引入了在补齐级强化特定类别特征的分类,通过强化拟议框架不同阶段的特征学习,增强特定类别特征的差别性。在四个微小的行动识别基准上进行了实验:Kinitics、SSv2、HMDB51和UCFC101。我们的广泛对比研究揭示了拟议贡献的效益。此外,我们的方法将建立一个具有挑战性的SS2的绝对精确度分类方法,作为我们目前公布的标准的绝对性标准,将建立一个具有挑战性的标准。

相关内容

小样本学习

关注 215

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

近期必读的5篇顶会CVPR 2021【视觉目标跟踪】相关论文和代码

专知会员服务

37+阅读 · 2021年3月23日

近期必读的5篇顶会CVPR 2021【行为识别】相关论文和代码

专知会员服务

60+阅读 · 2021年3月17日

【伯克利】元学习的元基线，A New Meta-Baseline for Few-Shot Learning

专知会员服务

67+阅读 · 2020年3月28日

元迁移学习的小样本学习，Meta-transfer Learning for Few-shot Learning

专知会员服务

159+阅读 · 2020年2月29日