The public model zoo containing enormous powerful pretrained model families (e.g., ResNet/DeiT) has reached an unprecedented scope than ever, which significantly contributes to the success of deep learning. As each model family consists of pretrained models with diverse scales (e.g., DeiT-Ti/S/B), it naturally arises a fundamental question of how to efficiently assemble these readily available models in a family for dynamic accuracy-efficiency trade-offs at runtime. To this end, we present Stitchable Neural Networks (SN-Net), a novel scalable and efficient framework for model deployment. It cheaply produces numerous networks with different complexity and performance trade-offs given a family of pretrained neural networks, which we call anchors. Specifically, SN-Net splits the anchors across the blocks/layers and then stitches them together with simple stitching layers to map the activations from one anchor to another. With only a few epochs of training, SN-Net effectively interpolates between the performance of anchors with varying scales. At runtime, SN-Net can instantly adapt to dynamic resource constraints by switching the stitching positions. Extensive experiments on ImageNet classification demonstrate that SN-Net can obtain on-par or even better performance than many individually trained networks while supporting diverse deployment scenarios. For example, by stitching Swin Transformers, we challenge hundreds of models in Timm model zoo with a single network. We believe this new elastic model framework can serve as a strong baseline for further research in wider communities.
翻译:公共的模型仓库包含着比以往任何时候都更多的大型预训练模型族群(如 ResNet/DeiT),这极大地促进了深度学习的成功。由于每个模型族群都包含多个不同规模的预训练模型(如 DeiT-Ti/S/B),因此自然而然地产生了一个基本问题,即如何在运行时根据准确性和效率的权衡,高效地组装这些现成的模型。为此,我们提出了可拼接的神经网络(SN-Net),这是一种新颖的可扩展和高效的模型部署框架。它可以在给定一组预训练神经网络的情况下,以显著降低成本的方式生成许多具有不同复杂度和性能权衡的网络,我们称之为锚点。具体而言,SN-Net 将这些锚点分成多块或层,并通过简单的拼接层将它们连接在一起,以将激活从一个锚点映射到另一个锚点。通过少数周期的训练,SN-Net 可以有效地在不同规模的锚点之间插值。在运行时,SN-Net 可以通过切换拼接位置,立即适应动态的资源约束。基于 ImageNet 分类,广泛的实验表明,SN-Net 可以获得与许多单独训练的网络相当甚至更好的性能,同时支持多种部署方案。例如,通过拼接 Swin 变形器,我们挑战了 Timm 模型仓库中的数百个模型,仅使用了一个网络。我们相信,这种新型的弹性模型框架可以成为更广泛的研究的强有力基础。