Classification of new class entities requires collecting and annotating hundreds or thousands of samples that is often prohibitively costly. Few-shot learning suggests learning to classify new classes using just a few examples. Only a small number of studies address the challenge of few-shot learning on spatio-temporal patterns such as videos. In this paper, we present the Temporal Aware Embedding Network (TAEN) for few-shot action recognition, that learns to represent actions, in a metric space as a trajectory, conveying both short term semantics and longer term connectivity between action parts. We demonstrate the effectiveness of TAEN on two few shot tasks, video classification and temporal action detection and evaluate our method on the Kinetics-400 and on ActivityNet 1.2 few-shot benchmarks. With training of just a few fully connected layers we reach comparable results to prior art on both few shot video classification and temporal detection tasks, while reaching state-of-the-art in certain scenarios.
翻译:新类实体的分类要求收集和说明往往费用高昂的数百或数千个样本。少见的学习建议仅仅用几个例子来学习对新类进行分类。只有少量的研究涉及在片段-时空模式(如视频)上进行微小的学习的挑战。在本文中,我们介绍“时间意识嵌入网络(TAEN) ”, 用于微小的动作识别,该网络学会在以光标空间作为轨迹来代表行动,传达短期语义学和行动部分之间的长期连通性。我们展示了TAEN在两个小任务(视频分类和时间行动探测)上的有效性,并评价了我们关于动因技术-400和活动网(活动网)和1.2个微小基准的方法。只要对几个完全相连的层进行培训,我们就能在某些视频小的拍摄分类和时间探测任务上取得与先前的艺术相似的结果,同时在某些情景中达到了最先进的艺术。