In many applications of machine learning, certain categories of examples may be underrepresented in the training data, causing systems to underperform on such "few-shot" cases at test time. A common remedy is to perform data augmentation, such as by duplicating underrepresented examples, or heuristically synthesizing new examples. But these remedies often fail to cover the full diversity and complexity of real examples. We propose a data augmentation approach that performs neural Example Extrapolation (Ex2). Given a handful of exemplars sampled from some distribution, Ex2 synthesizes new examples that also belong to the same distribution. The Ex2 model is learned by simulating the example generation procedure on data-rich slices of the data, and it is applied to underrepresented, few-shot slices. We apply Ex2 to a range of language understanding tasks and significantly improve over state-of-the-art methods on multiple few-shot learning benchmarks, including for relation extraction (FewRel) and intent classification + slot filling (SNIPS).
翻译:在许多机器学习应用中,某些类别的例子在培训数据中的代表性可能不足,导致系统在测试时对此类“发光”案例表现不佳。一种常见的补救办法是进行数据增强,例如重复代表性不足的例子,或超自然合成新例子。但这些补救办法往往不能涵盖真实例子的全部多样性和复杂性。我们建议采用一种数据增强方法,采用神经示例外推法(Ex.2)。鉴于从某些分布中抽取的少数示例,Ex2综合了也属于同一分布的新实例。Ex2模型通过模拟数据富集部分的样本生成程序来学习。它适用于代表性不足、少发片段。我们应用Ex2来完成一系列语言理解任务,并大大改进多发式学习基准的先进方法,包括相关提取(FewRel)和意图分类+槽填充(SNIPS)等。