Few-shot action recognition aims to recognize novel action classes (query) using just a few samples (support). The majority of current approaches follow the metric learning paradigm, which learns to compare the similarity between videos. Recently, it has been observed that directly measuring this similarity is not ideal since different action instances may show distinctive temporal distribution, resulting in severe misalignment issues across query and support videos. In this paper, we arrest this problem from two distinct aspects -- action duration misalignment and action evolution misalignment. We address them sequentially through a Two-stage Action Alignment Network (TA2N). The first stage locates the action by learning a temporal affine transform, which warps each video feature to its action duration while dismissing the action-irrelevant feature (e.g. background). Next, the second stage coordinates query feature to match the spatial-temporal action evolution of support by performing temporally rearrange and spatially offset prediction. Extensive experiments on benchmark datasets show the potential of the proposed method in achieving state-of-the-art performance for few-shot action recognition.The code of this project can be found at https://github.com/R00Kie-Liu/TA2N
翻译:微小的动作识别( 略微少见的动作识别) 旨在识别新行动类别( 询问), 仅使用少数样本( 支持) 。 多数当前方法都遵循衡量学习模式, 学会比较视频之间的相似性。 最近, 人们观察到, 直接测量这种相似性并不理想, 因为不同的行动实例可能显示独特的时间分布, 导致查询和支持视频之间的严重不匹配问题 。 在本文中, 我们从两个不同方面 -- -- 行动持续时间错配和行动演变不匹配问题 。 我们通过一个两阶段行动协调网络( TA2N) 来逐级地解决这些问题 。 第一阶段通过学习时间折线变换, 将每个视频特性与动作相关特性( 如背景) 调换到其动作持续时间( 如背景 ) 。 下个阶段协调查询功能, 以匹配支持的空间- 时序动作演进, 进行时间后期和空间抵消的预测 。 基准数据集的广泛实验显示, 实现“ 状态- 艺术” 性动作识别小片动作 。 该项目的代码可以在 http:// Kgi/ NTATADA 中找到 。