Self-supervised learning has been successfully applied to pre-train video representations, which aims at efficient adaptation from pre-training domain to downstream tasks. Existing approaches merely leverage contrastive loss to learn instance-level discrimination. However, lack of category information will lead to hard-positive problem that constrains the generalization ability of this kind of methods. We find that the multi-task process of meta learning can provide a solution to this problem. In this paper, we propose a Meta-Contrastive Network (MCN), which combines the contrastive learning and meta learning, to enhance the learning ability of existing self-supervised approaches. Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch. Extensive evaluations demonstrate the effectiveness of our method. For two downstream tasks, i.e., video action recognition and video retrieval, MCN outperforms state-of-the-art approaches on UCF101 and HMDB51 datasets. To be more specific, with R(2+1)D backbone, MCN achieves Top-1 accuracies of 84.8% and 54.5% for video action recognition, as well as 52.5% and 23.7% for video retrieval.
翻译:在培训前的视频演示中,成功应用了自我监督的学习,目的是从培训前领域到下游任务的有效适应。现有方法只是利用对比性损失来学习实例歧视。然而,缺乏分类信息会导致硬性的问题,限制这类方法的普及能力。我们发现,多重元学习的多重任务过程可以解决这个问题。在本文件中,我们建议建立一个元学习的元协调网络(MCN),将对比性学习和元学习结合起来,以加强现有自我监督方法的学习能力。我们的方法包括两个基于模型-认知性元学习(MAML)的培训阶段,每个阶段都由对比性分支和一个元分支组成。广泛的评估显示了我们的方法的有效性。对于两个下游任务,即视频行动识别和视频检索,MCN优于UCF101和HMDB51数据集的状态-艺术方法。更具体地说,R(2+1)D骨干,MCN在模型-AVA5 5 5 和 54% 的视频确认和 54% 的图像行动,MCN 5 达到Top-5 5 5 和 54% 54% 的图像确认。