通过探索和利用辅助数据,改进鲜热一般化 (Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data)

Few-shot learning involves learning an effective model from only a few labeled datapoints. The use of a small training set makes it difficult to avoid overfitting but also makes few-shot learning applicable to many important real-world settings. In this work, we focus on Few-shot Learning with Auxiliary Data (FLAD), a training paradigm that assumes access to auxiliary data during few-shot learning in hopes of improving generalization. Introducing auxiliary data during few-shot learning leads to essential design choices where hand-designed heuristics can lead to sub-optimal performance. In this work, we focus on automated sampling strategies for FLAD and relate them to the explore-exploit dilemma that is central in multi-armed bandit settings. Based on this connection we propose two algorithms -- EXP3-FLAD and UCB1-FLAD -- and compare them with methods that either explore or exploit, finding that the combination of exploration and exploitation is crucial. Using our proposed algorithms to train T5 yields a 9% absolute improvement over the explicitly multi-task pre-trained T0 model across 11 datasets.

翻译：少见的学习只涉及从几个标记的数据点学习一个有效的模型。使用一个小型的训练组很难避免过度适应, 但也使一些短小的学习适用于许多重要的现实世界环境。在这项工作中, 我们侧重于使用辅助数据( FLAD ) 的微小学习( FLAD ), 这是一种在短小的学习中可以获取辅助数据的培训模式, 目的是改进一般化。在短小的学习中引入辅助数据, 导致基本的设计选择, 即手工设计的超自然学能够导致亚最佳性性能。在这项工作中, 我们侧重于FLAD 的自动取样策略, 并将这些策略与多臂强盗环境中的核心探索开发困境联系起来。基于这一关联, 我们提出了两种算法 -- EXP3- FLAD 和 UCB1- FLAD --, 并把它们与探索或利用的方法进行比较, 发现将勘探和开发结合起来至关重要。使用我们提议的算法来培训T5, 使得在11个数据集中明显多任务前训练过的T0模型上实现9%的绝对的改进。

相关内容

小样本学习

关注 215

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日