深度孵化：通过分而治之训练大型模型 (Deep Incubation: Training Large Models by Divide-and-Conquering)

Recent years have witnessed a remarkable success of large deep learning models. However, training these models is challenging due to high computational costs, painfully slow convergence, and overfitting issues. In this paper, we present Deep Incubation, a novel approach that enables the efficient and effective training of large models by dividing them into smaller sub-modules that can be trained separately and assembled seamlessly. A key challenge for implementing this idea is to ensure the compatibility of the independently trained sub-modules. To address this issue, we first introduce a global, shared meta model, which is leveraged to implicitly link all the modules together, and can be designed as an extremely small network with negligible computational overhead. Then we propose a module incubation algorithm, which trains each sub-module to replace the corresponding component of the meta model and accomplish a given learning task. Despite the simplicity, our approach effectively encourages each sub-module to be aware of its role in the target large model, such that the finally-learned sub-modules can collaborate with each other smoothly after being assembled. Empirically, our method outperforms end-to-end (E2E) training in terms of both final accuracy and training efficiency. For example, on top of ViT-Huge, it improves the accuracy by 2.7% on ImageNet or achieves similar performance with 4x less training time. Notably, the gains are significant for downstream tasks as well (e.g., object detection and image segmentation on COCO and ADE20K). Code is available at https://github.com/LeapLabTHU/Deep-Incubation.

翻译：近年来，大型深度学习模型取得了显著的成功。然而，由于高计算成本、缓慢的收敛速度和过拟合问题，训练这些模型是具有挑战性的。本文介绍了一种全新的方法——深度孵化，通过将大型模型分解成可以分别训练和无缝组装的小模块，实现了大模型的高效有效训练。实现这个想法的关键是确保独立训练的子模块之间的兼容性。为此，我们首先引入了一个全局共享元模型，它被用于隐式连接所有模块，并可以设计为具有可忽略计算开销的极小网络。然后我们提出了一种模块孵化算法，它训练每个子模块来代替元模型的相应组件，并完成特定的学习任务。尽管方法简单，但有效地鼓励了每个子模块意识到它在目标大模型中的作用，使得最终学习的子模块在组装后可以平稳地协作。实验证明，相对于端到端训练，我们的方法在最终精度和训练效率方面都表现更好。例如，在 ViT-Huge 的基础上，它在 ImageNet 上提高了 2.7% 的准确度，或以 4 倍的训练时间获得类似的性能。值得注意的是，该方法在下游任务（如在 COCO 和 ADE20K 上的对象检测和图像分割）中的增益也很显著。代码可在 https://github.com/LeapLabTHU/Deep-Incubation 上获得。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ICML2021】无训练神经架构搜索

专知会员服务

20+阅读 · 2021年9月16日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

【ICML 2020】设置LayerNorm使Transformer加速收敛

专知会员服务

16+阅读 · 2020年7月27日