Transfer learning provides a way of leveraging knowledge from one task when learning another task. Performing transfer learning typically involves iteratively updating a model's parameters through gradient descent on a training dataset. In this paper, we introduce a fundamentally different method for transferring knowledge across models that amounts to "merging" multiple models into one. Our approach effectively involves computing a weighted average of the models' parameters. We show that this averaging is equivalent to approximately sampling from the posteriors of the model weights. While using an isotropic Gaussian approximation works well in some cases, we also demonstrate benefits by approximating the precision matrix via the Fisher information. In sum, our approach makes it possible to combine the "knowledge" in multiple models at an extremely low computational cost compared to standard gradient-based training. We demonstrate that model merging achieves comparable performance to gradient descent-based transfer learning on intermediate-task training and domain adaptation problems. We also show that our merging procedure makes it possible to combine models in previously unexplored ways. To measure the robustness of our approach, we perform an extensive ablation on the design of our algorithm.
翻译:在学习另一个任务时,传授学习是一种利用从一个任务中获取知识的方法。 进行传授学习通常涉及通过培训数据集的梯度下降来反复更新模型参数。 在本文中,我们引入了一种完全不同的方法,在各种模型之间转让知识,相当于“合并”多个模型,我们的方法有效地涉及计算模型参数的加权平均数。我们表明,这一平均率相当于从模型重量的后代中大约抽取的样本。虽然在某些情况下,使用非热带高斯近距离近似效果良好,但我们也通过利用渔业公司的信息来接近精确矩阵来显示好处。总而言之,我们的方法使得有可能将多种模型中的“知识”结合起来,而计算成本与标准的梯度培训相比极低。我们证明,将模型合并的结果与基于梯度下降的转移学习在中间任务培训和域适应问题上取得类似的业绩。我们还表明,我们的合并程序使得有可能以先前未探索的方式将模型组合在一起。 为了衡量我们方法的稳健性,我们在设计算法上进行了广泛的调整。