Git Re-Basin:合并模型变异模型对称 (Git Re-Basin: Merging Models modulo Permutation Symmetries)

The success of deep learning is due in large part to our ability to solve certain massive non-convex optimization problems with relative ease. Though non-convex optimization is NP-hard, simple algorithms -- often variants of stochastic gradient descent -- exhibit surprising effectiveness in fitting large neural networks in practice. We argue that neural network loss landscapes contain (nearly) a single basin after accounting for all possible permutation symmetries of hidden units a la Entezari et al. (2021). We introduce three algorithms to permute the units of one model to bring them into alignment with a reference model in order to merge the two models in weight space. This transformation produces a functionally equivalent set of weights that lie in an approximately convex basin near the reference model. Experimentally, we demonstrate the single basin phenomenon across a variety of model architectures and datasets, including the first (to our knowledge) demonstration of zero-barrier linear mode connectivity between independently trained ResNet models on CIFAR-10 and CIFAR-100. Additionally, we investigate intriguing phenomena relating model width and training time to mode connectivity. Finally, we discuss shortcomings of the linear mode connectivity hypothesis, including a counterexample to the single basin theory.

翻译：深层次学习的成功在很大程度上归功于我们能否以相对容易的方式解决某些大规模非convex优化问题。虽然非convex优化是NP硬的简单算法 -- -- 往往是随机梯度下降的变种 -- -- 在实际中在安装大型神经网络时表现出惊人的效力。我们争辩说,神经网络丧失景观包含(近于)一个单一盆地,这是在计算了所有可能的调和性对称之后,一个“Entezari等人”(2021年)的隐藏单元。我们引入了三种算法,使一个模型的单位与一个参考模型保持一致,以便将两个模型合并在重量空间中。这种转变产生了一套功能上等同的权重,存在于靠近参考模型的大约 convex盆地。我们实验性地展示了各种模型结构和数据集中的单一流域现象,包括(我们所知的)第一个演示了独立训练的ResNet模型在CIFAR-10和CIFAR-100上之间的零压线性模式连接。此外,我们调查了与宽度和培训时空模型相关的模型现象,包括连接到单一模式的线性模型。最后我们讨论了一个线性模型的连接模式。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日