将模型与渔业-加权效益合并 (Merging Models with Fisher-Weighted Averaging)

Averaging the parameters of models that have the same architecture and initialization can provide a means of combining their respective capabilities. In this paper, we take the perspective that this "merging" operation can be seen as choosing parameters that approximately maximize the joint likelihood of the posteriors of the models' parameters. Computing a simple average of the models' parameters therefore corresponds to making an isotropic Gaussian approximation to their posteriors. We develop an alternative merging procedure based on the Laplace approximation where we approximate each model's posterior as a Gaussian distribution whose precision matrix corresponds to its Fisher information. We first show that our "Fisher merging" technique provides a performance boost in settings where simple parameter averaging is currently used -- specifically, robust fine-tuning and model ensembling. Then, we compare merging to standard gradient-based transfer learning and demonstrate that merging enables a fundamentally different method for transferring capabilities across models. Specifically, we show that Fisher merging is competitive with gradient-based transfer learning approaches (while being significantly cheaper) in intermediate-task training and domain-adaptive pre-training. We also show that our merging procedure makes it possible to combine models in previously unexplored ways. We release our code to facilitate future research into methods for merging models.

翻译：将具有相同架构和初始化的模型参数转换为具有相同架构和初始化的模型参数,可以提供一种整合各自能力的手段。在本文中,我们从这个角度认为,这种“合并”操作可以被视为选择一些参数,使模型参数的子孙的共同可能性最大化。因此,计算模型参数的简单平均数,相当于将一个异位高萨近似与其子孙相匹配。我们开发了一个基于拉比近点的替代合并程序,我们根据拉比近点将每个模型的后半成像作为高山分布,其精确矩阵与其渔业信息相对应。我们首先显示,在目前使用简单参数平均的环境下,我们的“纤维合并”技术提供了一种性能提升,具体地说,是强有力的微调和模型组合。然后,我们比较了模型的合并,表明合并可以使不同模型之间能力转移的方法根本不同。具体地说,我们表明,在中期任务培训和域价前训练中,渔业整合与基于梯度的转让学习方法(虽然价格非常低)具有竞争力。我们还表明,我们的合并程序有助于将我们未来的研究模式合并,从而可以合并为非勘探模型。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【经典书】线性代数，436页pdf

专知会员服务

77+阅读 · 2021年3月16日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日