We propose a novel few-shot action recognition framework, STRM, which enhances class-specific feature discriminability while simultaneously learning higher-order temporal representations. The focus of our approach is a novel spatio-temporal enrichment module that aggregates spatial and temporal contexts with dedicated local patch-level and global frame-level feature enrichment sub-modules. Local patch-level enrichment captures the appearance-based characteristics of actions. On the other hand, global frame-level enrichment explicitly encodes the broad temporal context, thereby capturing the relevant object features over time. The resulting spatio-temporally enriched representations are then utilized to learn the relational matching between query and support action sub-sequences. We further introduce a query-class similarity classifier on the patch-level enriched features to enhance class-specific feature discriminability by reinforcing the feature learning at different stages in the proposed framework. Experiments are performed on four few-shot action recognition benchmarks: Kinetics, SSv2, HMDB51 and UCF101. Our extensive ablation study reveals the benefits of the proposed contributions. Furthermore, our approach sets a new state-of-the-art on all four benchmarks. On the challenging SSv2 benchmark, our approach achieves an absolute gain of 3.5% in classification accuracy, as compared to the best existing method in the literature. Our code and models will be publicly released.
翻译:我们提议了一个新的微小行动识别框架,即STRM, 该框架在同时学习更高层次的时间代表制的同时,增强特定阶级特征的差别性特征。我们的方法重点是一个新的时空浓缩模块,该模块将空间和时空的浓缩环境与专门的局部和全球框架级特征浓缩子模块结合起来。地方补丁级浓缩反映了基于外观的行动特征。另一方面,全球框架级浓缩明确编码了广泛的时间背景,从而随着时间的推移捕捉了相关对象特征。随后,利用由此而形成的时空浓缩表层来学习查询与支持行动次序列之间的关系匹配。我们进一步引入了在补齐级强化特定类别特征的分类,通过强化拟议框架不同阶段的特征学习,增强特定类别特征的差别性。在四个微小的行动识别基准上进行了实验:Kinitics、SSv2、HMDB51和UCFC101。我们的广泛对比研究揭示了拟议贡献的效益。此外,我们的方法将建立一个具有挑战性的SS2的绝对精确度分类方法,作为我们目前公布的标准的绝对性标准,将建立一个具有挑战性的标准。