This paper asks whether extrapolating the hidden space distribution of text examples from one class onto another is a valid inductive bias for data augmentation. To operationalize this question, I propose a simple data augmentation protocol called "good-enough example extrapolation" (GE3). GE3 is lightweight and has no hyperparameters. Applied to three text classification datasets for various data imbalance scenarios, GE3 improves performance more than upsampling and other hidden-space data augmentation methods.
翻译:本文询问从一个类到另一个类的文本示例的隐藏空间分布外推是否是数据增强的有效感应偏差。 为了实施这一问题,我提议了一个简单的数据增强协议,名为“好例子外推法”(GE3),GE3是轻量级的,没有超参数。应用到三种文本分类数据集,用于各种数据不平衡的假设,GE3比抽取和其他隐藏空间数据增强方法更能提高性能。