Due to the huge amount of parameters, fine-tuning of pretrained language models (PLMs) is prone to overfitting in the low resource scenarios. In this work, we present a novel method that operates on the hidden representations of a PLM to reduce overfitting. During fine-tuning, our method inserts random autoencoders between the hidden layers of a PLM, which transform activations from the previous layers into a multi-view compressed representation before feeding it into the upper layers. The autoencoders are plugged out after fine-tuning, so our method does not add extra parameters or increase computation cost during inference. Our method demonstrates promising performance improvement across a wide range of sequence- and token-level low-resource NLP tasks.
翻译:由于参数数量巨大,对预先培训的语言模型进行微调容易在低资源情景中过度适应。 在这项工作中,我们展示了一种新的方法,在PLM的隐蔽外观上操作,以减少超能力。在微调过程中,我们的方法在PLM的隐藏层之间插入随机自动代号,将前层的激活转化为多视角压缩代号,然后将其注入上层。在微调后,自动代号被插入,因此,我们的方法不会在推断过程中增加额外的参数或增加计算成本。我们的方法显示,在一系列广泛的序列和象征性的低资源级 NLP任务中,我们的方法有望改进性能。