无数据知识通过合并语言模型的重力融合无数据知识 (Dataless Knowledge Fusion by Merging Weights of Language Models)

Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. Oftentimes fine-tuned models are readily available but their training data is not, due to data privacy or intellectual property concerns. This creates a barrier to fusing knowledge across individual models to yield a better single model. In this paper, we study the problem of merging individual models built on different training data sets to obtain a single model that performs well both across all data set domains and can generalize on out-of-domain data. We propose a dataless knowledge fusion method that merges models in their parameter space, guided by weights that minimize prediction differences between the merged model and the individual models. Over a battery of evaluation settings, we show that the proposed method significantly outperforms baselines such as Fisher-weighted averaging or model ensembling. Further, we find that our method is a promising alternative to multi-task learning that can preserve or sometimes improve over the individual models without access to the training data. Finally, model merging is more efficient than training a multi-task model, thus making it applicable to a wider set of scenarios.

翻译：微调经过培训的语文模型已成为建设下游国家劳工规划模型的普遍范例。由于数据隐私或知识产权方面的关切,通常很容易获得经过微调的模型,但其培训数据却不是。这为在单个模型中应用知识以产生更好的单一模型制造障碍。在本文中,我们研究将基于不同培训数据集的单个模型合并在一起的问题,以获得一种单一模型,既在所有数据集领域运作良好,又可以推广外部数据。我们提出一种无数据知识集成方法,在参数空间中将模型合并为无数据知识集成,同时以各种加权为指导,最大限度地减少合并模型与单个模型之间的预测差异。在评估环境的积聚中,我们显示拟议方法大大优于诸如渔渔用加权平均值或集成型模型等基线。此外,我们发现,我们的方法是多任务学习的一种有希望的替代方法,可以保存或有时改进单个模型,而不能获得培训数据。最后,模型合并比培训多任务模型更为有效,因此适用于更广泛的假设。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日