Layer-wise model fusion via optimal transport, named OTFusion, applies soft neuron association for unifying different pre-trained networks to save computational resources. While enjoying its success, OTFusion requires the input networks to have the same number of layers. To address this issue, we propose a novel model fusion framework, named CLAFusion, to fuse neural networks with a different number of layers, which we refer to as heterogeneous neural networks, via cross-layer alignment. The cross-layer alignment problem, which is an unbalanced assignment problem, can be solved efficiently using dynamic programming. Based on the cross-layer alignment, our framework balances the number of layers of neural networks before applying layer-wise model fusion. Our experiments indicate that CLAFusion, with an extra finetuning process, improves the accuracy of residual networks on the CIFAR10, CIFAR100, and Tiny-ImageNet datasets. Furthermore, we explore its practical usage for model compression and knowledge distillation when applying to the teacher-student setting.
翻译:通过最佳运输,称为OTFusion, 以图图层融合模型, 使用软神经协会, 以统一不同的预先训练的网络, 以节省计算资源。 OTFusion 在取得成功的同时, 要求输入网络的层数相同。 为了解决这个问题, 我们提议了一个名为 CLAFusion 的新型模型融合框架, 以不同层数的神经网络为单位, 我们称之为多层神经网络。 跨层对齐问题是一个不平衡的任务分配问题, 可以通过动态编程来有效解决。 基于跨层对齐, 我们的框架平衡了神经网络的层数层数, 然后再应用多层的模型聚合。 我们的实验表明, CLAFusion 通过超微调过程, 提高了CIFAR10、 CIFAR100 和 Tiny- ImageNet 数据集的残余网络的精度。 此外, 我们探索它在应用教师设置时用于模型压缩和知识蒸馏的实际用途。