While large-scale pretrained language models have obtained impressive results when fine-tuned on a wide variety of tasks, they still often suffer from overfitting in low-resource scenarios. Since such models are general-purpose feature extractors, many of these features are inevitably irrelevant for a given target task. We propose to use Variational Information Bottleneck (VIB) to suppress irrelevant features when fine-tuning on low-resource target tasks, and show that our method successfully reduces overfitting. Moreover, we show that our VIB model finds sentence representations that are more robust to biases in natural language inference datasets, and thereby obtains better generalization to out-of-domain datasets. Evaluation on seven low-resource datasets in different tasks shows that our method significantly improves transfer learning in low-resource scenarios, surpassing prior work. Moreover, it improves generalization on 13 out of 15 out-of-domain natural language inference benchmarks. Our code is publicly available in https://github.com/rabeehk/vibert.
翻译:虽然在微调各种任务时,大规模预先培训的语言模式取得了令人印象深刻的成果,但它们仍然常常在低资源情景中过于适应。由于这些模式是通用特征提取器,因此其中许多特征不可避免地与特定目标任务无关。我们提议在微调低资源目标任务时,使用变式信息瓶颈(VIB)来抑制不相干的特点,并表明我们的方法成功地降低了超能力。此外,我们显示,我们的VIB模型发现,在自然语言推断数据集中,对偏差更强烈的句子表达方式,从而获得更好的外置数据集的概括化。对不同任务中的七个低资源数据集的评价表明,我们的方法大大改进了在低资源情景中的转移学习,超过了先前的工作。此外,它还改进了15种外庭自然语言推断基准中13种的概括性。我们的代码在 https://github.com/rabeehk/vibert中可以公开查阅。