重新思考任务抽样,用于少见的视觉语言传动学习 (Rethinking Task Sampling for Few-shot Vision-Language Transfer Learning)

Despite achieving state-of-the-art zero-shot performance, existing vision-language models still fall short of few-shot transfer ability on domain-specific problems. Classical fine-tuning often fails to prevent highly expressive models from exploiting spurious correlations. Although model-agnostic meta-learning (MAML) presents as a natural alternative for few-shot transfer learning, the expensive computation due to implicit second-order optimization limits its use on large-scale vision-language models such as CLIP. While much literature has been devoted to exploring alternative optimization strategies, we identify another essential aspect towards effective few-shot transfer learning, task sampling, which is previously only be viewed as part of data pre-processing in MAML. To show the impact of task sampling, we propose a simple algorithm, Model-Agnostic Multitask Fine-tuning (MAMF), which differentiates classical fine-tuning only on uniformly sampling multiple tasks. Despite its simplicity, we show that MAMF consistently outperforms classical fine-tuning on five few-shot vision-language classification tasks. We further show that the effectiveness of the bi-level optimization in MAML is highly sensitive to the zero-shot performance of a task in the context of few-shot vision-language classification. The goal of this paper is to provide new insights on what makes few-shot learning work, and encourage more research into investigating better task sampling strategies.

翻译：尽管实现了最先进的零点表现,但现有的视觉语言模型仍然没有达到在特定领域问题上的微小传输能力。典型微调往往未能防止高度直观模型利用虚假的关联性。虽然模型 -- -- 不可知性元学习(MAML)是少见的转移学习的一种自然选择,但由于隐含的二级优化造成的昂贵计算限制了其在大型视觉语言模型(如CLIP)中的使用。虽然大量文献都用于探索替代性优化战略,但我们发现另一个重要方面,即有效的微小传输学习、任务抽样(以前仅被视为MAML数据预处理的一部分)。为了显示任务抽样的影响,我们提议了一个简单的算法,即模型 -- -- 不可知性多任务微调(MAML)微调(ML)作为自然的替代方法,它只是将典型的微调调整局限于统一抽样的多重任务。尽管它很简洁,但我们表明MAML始终在五个微点的视觉语言分类任务上优于典型的微调。我们进一步表明,在MAML的双级优化中,这几个层次的优化只是作为MAML的数据预处理的一部分。我们更敏锐地研究任务对零点研究任务做了更敏锐化工作做了一个目标。

相关内容

小样本学习

关注 215

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【ICCV 2019 Toturial】Interpretable Machine Learning for Computer Vision（用于计算机视觉的可解释性机器学习）

专知会员服务

32+阅读 · 2019年10月30日