培训前语言模型的模型不确定性-软件知识综合组合 (Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language Models)

As many fine-tuned pre-trained language models~(PLMs) with promising performance are generously released, investigating better ways to reuse these models is vital as it can greatly reduce the retraining computational cost and the potential environmental side-effects. In this paper, we explore a novel model reuse paradigm, Knowledge Amalgamation~(KA) for PLMs. Without human annotations available, KA aims to merge the knowledge from different teacher-PLMs, each of which specializes in a different classification problem, into a versatile student model. The achieve this, we design a Model Uncertainty--aware Knowledge Amalgamation~(MUKA) framework, which identifies the potential adequate teacher using Monte-Carlo Dropout for approximating the golden supervision to guide the student. Experimental results demonstrate that MUKA achieves substantial improvements over baselines on benchmark datasets. Further analysis shows that MUKA can generalize well under several complicate settings with multiple teacher models, heterogeneous teachers, and even cross-dataset teachers.

翻译：由于许多经过精密培训的、业绩良好的语言模型-(PLMs)慷慨解囊,调查如何更好地再利用这些模型至关重要,因为这可以大大降低再培训计算成本和潜在的环境副作用。在本文中,我们探讨了一个新的模式再利用范式,即PLMs的 " 知识融合-(KA) " (KA)。没有人文说明,KA的目标是将不同教师-PLMs的知识(每个教师都专门处理不同的分类问题)合并成一个多功能学生模型。实现这一点,我们设计了一个模型的 " 不确定性-认知知识综合-综合-(MUKA)-(MUKA) " (MUKA)框架,其中用蒙特-Carlo Droppout(Monte-Carlo Droppout)来确定潜在的适当教师,以接近金质监督指导学生。实验结果表明,MUKA在基准数据集基线上取得了重大改进。进一步分析表明,MUKA可以将多种教师模型、混杂教师、甚至交叉数据教师的复杂环境下广泛归纳。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

知识图谱嵌入模型的概率标定,Probability Calibration for Knowledge Graph Embedding Models

专知会员服务

36+阅读 · 2020年5月11日

【IJCAI2020】从语言图谱到常识图谱，TransOMCS: From Linguistic Graphs to Commonsense Knowledge

专知会员服务

26+阅读 · 2020年5月6日

【知识图谱嵌入补全综述论文】embedding models for knowledge base completion

专知会员服务

102+阅读 · 2020年4月25日