Few-shot action recognition, i.e. recognizing new action classes given only a few examples, benefits from incorporating temporal information. Prior work either encodes such information in the representation itself and learns classifiers at test time, or obtains frame-level features and performs pairwise temporal matching. We first evaluate a number of matching-based approaches using features from spatio-temporal backbones, a comparison missing from the literature, and show that the gap in performance between simple baselines and more complicated methods is significantly reduced. Inspired by this, we propose Chamfer++, a non-temporal matching function that achieves state-of-the-art results in few-shot action recognition. We show that, when starting from temporal features, our parameter-free and interpretable approach can outperform all other matching-based and classifier methods for one-shot action recognition on three common datasets without using temporal information in the matching stage. Project page: https://jbertrand89.github.io/matching-based-fsar
翻译:少样本动作识别,即在仅有少量的样本情况下识别新的动作类别,从时间信息中获得益处。之前的工作要么是在表示本身中编码这种信息并在测试时学习分类器,要么是获得帧级特征并执行成对的时间匹配。我们首先使用时空背景的特征评估了许多基于匹配的方法,这是文献中缺少的比较,结果显示在简单的基线和更复杂的方法之间的性能差距显著缩小。在此启发下,我们提出了Chamfer++,一种非时间匹配函数,它在少样本动作识别方面取得了最先进的结果。我们证明,当从时间特征开始时,我们无参数和可解释的方法可以在三个常见数据集上胜过所有其他基于匹配和分类器方法的单次动作识别,而不在匹配阶段使用时间信息。项目页面:https://jbertrand89.github.io/matching-based-fsar