The lack of large-scale real datasets with annotations makes transfer learning a necessity for video activity understanding. We aim to develop an effective method for few-shot transfer learning for first-person action classification. We leverage independently trained local visual cues to learn representations that can be transferred from a source domain, which provides primitive action labels, to a different target domain using only a handful of examples. Visual cues we employ include object-object interactions, hand grasps and motion within regions that are a function of hand locations. We employ a framework based on meta-learning to extract the distinctive and domain invariant components of the deployed visual cues. This enables transfer of action classification models across public datasets captured with diverse scene and action configurations. We present comparative results of our transfer learning methodology and report superior results over state-of-the-art action classification approaches for both inter-class and inter-dataset transfer.
翻译:缺乏带有说明的大规模真实数据集,使得转移学习成为了解视频活动的必要条件。我们的目标是开发一种有效的方法,用于为第一人行动分类而进行微小的转移学习。我们利用独立培训的当地视觉提示来学习可从源域(提供原始动作标签)转移到不同目标域(仅使用少数例子)的演示。我们使用的视觉提示包括作为人工地点功能的区域内的物体-物体相互作用、手握和运动。我们使用基于元学习的框架来提取部署的视觉提示的独特和域内变量组成部分。这样就可以将行动分类模型转移到以不同场景和动作配置采集的公开数据集之间。我们介绍了我们转移学习方法的比较结果,并报告了不同类别间和数据间传输的最新行动分类方法的优异结果。