Recent work has focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the in-distribution performance for downstream tasks. However, very few of these studies have analyzed the impact of compression on the generalizability and robustness of compressed models for out-of-distribution (OOD) data. Towards this end, we study two popular model compression techniques including knowledge distillation and pruning and show that the compressed models are significantly less robust than their PLM counterparts on OOD test sets although they obtain similar performance on in-distribution development sets for a task. Further analysis indicates that the compressed models overfit on the shortcut samples and generalize poorly on the hard ones. We further leverage this observation to develop a regularization strategy for robust model compression based on sample uncertainty. Experimental results on several natural language understanding tasks demonstrate that our bias mitigation framework improves the OOD generalization of the compressed models, while not sacrificing the in-distribution task performance.
翻译:最近的工作重点是压缩诸如BERT等预先培训的语言模型(PLM),主要重点是改进下游任务的分配性能;然而,这些研究中很少分析压缩对分发外数据压缩模型(OOOD)的一般性和可靠性的影响。为此,我们研究了两种流行的压缩模型技术,包括知识蒸馏和剪剪裁,并表明压缩模型在OOOD测试组方面比PLM对口单位的强度要小得多,尽管它们在分配性开发组中取得了类似业绩,以完成一项任务。进一步的分析表明,压缩模型过分适合捷径样本,而粗略地概括硬模型。我们进一步利用这一观察来制定基于抽样不确定性的稳健模型压缩的正规化战略。关于几种自然语言理解任务的实验结果表明,我们的减少偏差框架改进了OOD对压缩模型的一般化,同时不牺牲分配性任务的业绩。</s>