Recent years have witnessed a remarkable success of large deep learning models. However, training these models is challenging due to high computational costs, painfully slow convergence, and overfitting issues. In this paper, we present Deep Incubation, a novel approach that enables the efficient and effective training of large models by dividing them into smaller sub-modules that can be trained separately and assembled seamlessly. A key challenge for implementing this idea is to ensure the compatibility of the independently trained sub-modules. To address this issue, we first introduce a global, shared meta model, which is leveraged to implicitly link all the modules together, and can be designed as an extremely small network with negligible computational overhead. Then we propose a module incubation algorithm, which trains each sub-module to replace the corresponding component of the meta model and accomplish a given learning task. Despite the simplicity, our approach effectively encourages each sub-module to be aware of its role in the target large model, such that the finally-learned sub-modules can collaborate with each other smoothly after being assembled. Empirically, our method outperforms end-to-end (E2E) training in terms of both final accuracy and training efficiency. For example, on top of ViT-Huge, it improves the accuracy by 2.7% on ImageNet or achieves similar performance with 4x less training time. Notably, the gains are significant for downstream tasks as well (e.g., object detection and image segmentation on COCO and ADE20K). Code is available at https://github.com/LeapLabTHU/Deep-Incubation.
翻译:近年来,大型深度学习模型取得了显著的成功。然而,由于高计算成本、缓慢的收敛速度和过拟合问题,训练这些模型是具有挑战性的。本文介绍了一种全新的方法——深度孵化,通过将大型模型分解成可以分别训练和无缝组装的小模块,实现了大模型的高效有效训练。实现这个想法的关键是确保独立训练的子模块之间的兼容性。为此,我们首先引入了一个全局共享元模型,它被用于隐式连接所有模块,并可以设计为具有可忽略计算开销的极小网络。然后我们提出了一种模块孵化算法,它训练每个子模块来代替元模型的相应组件,并完成特定的学习任务。尽管方法简单,但有效地鼓励了每个子模块意识到它在目标大模型中的作用,使得最终学习的子模块在组装后可以平稳地协作。实验证明,相对于端到端训练,我们的方法在最终精度和训练效率方面都表现更好。例如,在 ViT-Huge 的基础上,它在 ImageNet 上提高了 2.7% 的准确度,或以 4 倍的训练时间获得类似的性能。值得注意的是,该方法在下游任务(如在 COCO 和 ADE20K 上的对象检测和图像分割)中的增益也很显著。代码可在 https://github.com/LeapLabTHU/Deep-Incubation 上获得。