Few-shot action recognition aims to recognize novel action classes using only a small number of labeled training samples. In this work, we propose a novel approach that first summarizes each video into compound prototypes consisting of a group of global prototypes and a group of focused prototypes, and then compares video similarity based on the prototypes. Each global prototype is encouraged to summarize a specific aspect from the entire video, for example, the start/evolution of the action. Since no clear annotation is provided for the global prototypes, we use a group of focused prototypes to focus on certain timestamps in the video. We compare video similarity by matching the compound prototypes between the support and query videos. The global prototypes are directly matched to compare videos from the same perspective, for example, to compare whether two actions start similarly. For the focused prototypes, since actions have various temporal variations in the videos, we apply bipartite matching to allow the comparison of actions with different temporal positions and shifts. Experiments demonstrate that our proposed method achieves state-of-the-art results on multiple benchmarks.
翻译:微小的动作识别旨在识别新行动类别, 仅使用少量的标签培训样本。 在这项工作中, 我们提出一种新的方法, 首先将每部视频汇总为由一组全球原型和一组重点原型组成的复合原型, 然后比较基于原型的视频相似性。 鼓励每个全球原型总结整个视频的一个具体方面, 例如, 动作的开始/ 演进。 由于对全球原型没有提供明确的注释, 我们使用一组重点原型来关注视频中的某些时间戳。 我们通过将复合原型与支持视频和查询视频相匹配来比较视频的相似性。 全球原型直接匹配, 从相同的角度比较视频, 例如, 比较两种行动是否开始相似。 对于重点原型, 由于在视频中行动的时间变化各不相同, 我们应用双方匹配来比较行动与不同时间位置和变化。 实验显示, 我们拟议方法在多个基准上取得了最新结果 。