As many fine-tuned pre-trained language models~(PLMs) with promising performance are generously released, investigating better ways to reuse these models is vital as it can greatly reduce the retraining computational cost and the potential environmental side-effects. In this paper, we explore a novel model reuse paradigm, Knowledge Amalgamation~(KA) for PLMs. Without human annotations available, KA aims to merge the knowledge from different teacher-PLMs, each of which specializes in a different classification problem, into a versatile student model. The achieve this, we design a Model Uncertainty--aware Knowledge Amalgamation~(MUKA) framework, which identifies the potential adequate teacher using Monte-Carlo Dropout for approximating the golden supervision to guide the student. Experimental results demonstrate that MUKA achieves substantial improvements over baselines on benchmark datasets. Further analysis shows that MUKA can generalize well under several complicate settings with multiple teacher models, heterogeneous teachers, and even cross-dataset teachers.
翻译:由于许多经过精密培训的、业绩良好的语言模型-(PLMs)慷慨解囊,调查如何更好地再利用这些模型至关重要,因为这可以大大降低再培训计算成本和潜在的环境副作用。在本文中,我们探讨了一个新的模式再利用范式,即PLMs的 " 知识融合-(KA) " (KA)。没有人文说明,KA的目标是将不同教师-PLMs的知识(每个教师都专门处理不同的分类问题)合并成一个多功能学生模型。实现这一点,我们设计了一个模型的 " 不确定性-认知知识综合-综合-(MUKA)-(MUKA) " (MUKA)框架,其中用蒙特-Carlo Droppout(Monte-Carlo Droppout)来确定潜在的适当教师,以接近金质监督指导学生。实验结果表明,MUKA在基准数据集基线上取得了重大改进。进一步分析表明,MUKA可以将多种教师模型、混杂教师、甚至交叉数据教师的复杂环境下广泛归纳。