Layer-wise model fusion via optimal transport, named OTFusion, applies soft neuron association for unifying different pre-trained networks to save computational resources. While enjoying its success, OTFusion requires the input networks to have the same number of layers. To address this issue, we propose a novel model fusion framework, named CLAFusion, to fuse neural networks with a different number of layers, which we refer to as heterogeneous neural networks, via cross-layer alignment. The cross-layer alignment problem, which is an unbalanced assignment problem, can be solved efficiently using dynamic programming. Based on the cross-layer alignment, our framework balances the number of layers of neural networks before applying layer-wise model fusion. Our synthetic experiments indicate that the fused network from CLAFusion achieves a more favorable performance compared to the individual networks trained on heterogeneous data without the need for any retraining. With an extra fine-tuning process, it improves the accuracy of residual networks on the CIFAR10 dataset. Finally, we explore its application for model compression and knowledge distillation when applying to the teacher-student setting.
翻译:通过最佳运输,称为OTFusion, 以图图层化模型聚合, 运用软神经协会来统一不同的预先训练的网络, 以节省计算资源。 OTFusion 在享受成功的同时, 要求输入网络的层数相同。 为了解决这个问题, 我们提议了一个叫CLAFusion的新型模型聚合框架, 以不同层数的神经网络为导体, 我们称之为异质神经网络, 通过跨层对齐。 跨层对齐问题是一个不平衡的任务问题, 可以通过动态编程来有效解决。 基于跨层对齐, 我们的框架平衡了神经网络的层数。 我们的合成实验表明, CLAFusion 的导体网络比在无需再培训的情况下接受混合数据培训的单个网络, 取得了更有利的性能。 通过额外的微调过程, 它提高了 CIFAR10 数据集的残余网络的准确性。 最后, 我们探索了在应用师范设置时用于模型压缩和知识蒸馏的应用 。