鲜热行动识别时间关系交叉转换器 (Temporal-Relational CrossTransformers for Few-Shot Action Recognition)

We propose a novel approach to few-shot action recognition, finding temporally-corresponding frame tuples between the query and videos in the support set. Distinct from previous few-shot works, we construct class prototypes using the CrossTransformer attention mechanism to observe relevant sub-sequences of all support videos, rather than using class averages or single best matches. Video representations are formed from ordered tuples of varying numbers of frames, which allows sub-sequences of actions at different speeds and temporal offsets to be compared. Our proposed Temporal-Relational CrossTransformers (TRX) achieve state-of-the-art results on few-shot splits of Kinetics, Something-Something V2 (SSv2), HMDB51 and UCF101. Importantly, our method outperforms prior work on SSv2 by a wide margin (12%) due to the its ability to model temporal relations. A detailed ablation showcases the importance of matching to multiple support set videos and learning higher-order relational CrossTransformers.

翻译：我们建议一种新颖的方法来识别几发动作, 在支持组的查询和视频之间找到时间对应框架图例。与以往的几发作品不同, 我们使用交叉传输关注机制构建了类原型, 以观察所有支持视频的相关次序列, 而不是使用类平均值或单一最佳匹配。视频演示由数量不等的定序图例组成, 允许以不同速度和时间偏移进行动作的次序列比较。我们拟议的时间- 关系交叉转换( TRX) 在几发“ 动因、某物 V2 (SSv2)、 HMDB51 和 UCF101 ” 上取得了最新的结果。重要的是, 我们的方法由于能够模拟时间关系( 12% ), 大大超越了先前在 SSv2 上的工作。一个详细的缩略图显示匹配多个支持设置的视频和学习更高顺序交叉转换器的重要性。

相关内容

小样本学习

关注 215

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【ACM Multimedia 2020】双时间存储网络有效的视频对象分割

专知会员服务

10+阅读 · 2020年8月13日

【DeepMind】CrossTransformers: 空间感知的小样本迁移

专知会员服务

40+阅读 · 2020年7月26日

【CVPR2020】视频符号语言识别中跨领域知识的传递, Transferring Cross-domain Knowledge for Video Sign Language Recognition

专知会员服务

9+阅读 · 2020年4月17日

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

专知会员服务

78+阅读 · 2020年2月25日