Language models with the Transformers structure have shown great performance in natural language processing. However, there still poses problems when fine-tuning pre-trained language models on downstream tasks, such as over-fitting or representation collapse. In this work, we propose HyPe, a simple yet effective fine-tuning technique to alleviate such problems by perturbing hidden representations of Transformers layers. Unlike previous works that only add noise to inputs or parameters, we argue that the hidden representations of Transformers layers convey more diverse and meaningful language information. Therefore, making the Transformers layers more robust to hidden representation perturbations can further benefit the fine-tuning of PLMs en bloc. We conduct extensive experiments and analyses on GLUE and other natural language inference datasets. Results demonstrate that HyPe outperforms vanilla fine-tuning and enhances generalization of hidden representations from different layers. In addition, HyPe acquires negligible computational overheads, and is better than and compatible with previous state-of-the-art fine-tuning techniques.
翻译:在自然语言处理过程中,与变异器结构有关的语言模型表现出了很高的自然语言处理能力。然而,在对受过训练的下游任务语言模型进行微调,如超装或代表制崩溃等时,仍然存在着问题。在这项工作中,我们建议Hype是一种简单而有效的微调技术,通过干扰变异器层的隐蔽表达方式来缓解这些问题。与以前只增加输入或参数噪音的工程不同,我们争辩说变异器层的隐性表达方式传达了更多样和有意义的语言信息。因此,使变异器层对隐蔽的表达方式的干扰更加强大,可以进一步有利于DLMS集团的微调。我们对GLUE和其他自然语言的推断数据集进行广泛的实验和分析。结果表明,HyPe超过香草微调,加强了不同层隐蔽的表达方式的概括性。此外,Hype获得的可忽略的计算间接间接数据,而且比以往最先进的微调技术更好和更兼容。