Large deep learning models have achieved remarkable success in many scenarios. However, training large models is usually challenging, e.g., due to the high computational cost, the unstable and painfully slow optimization procedure, and the vulnerability to overfitting. To alleviate these problems, this work studies a divide-and-conquer strategy, i.e., dividing a large model into smaller modules, training them independently, and reassembling the trained modules to obtain the target model. This approach is promising since it avoids directly training large models from scratch. Nevertheless, implementing this idea is non-trivial, as it is difficult to ensure the compatibility of the independently trained modules. In this paper, we present an elegant solution to address this issue, i.e., we introduce a global, shared meta model to implicitly link all the modules together. This enables us to train highly compatible modules that collaborate effectively when they are assembled together. We further propose a module incubation mechanism that enables the meta model to be designed as an extremely shallow network. As a result, the additional overhead introduced by the meta model is minimalized. Though conceptually simple, our method significantly outperforms end-to-end (E2E) training in terms of both final accuracy and training efficiency. For example, on top of ViT-Huge, it improves the accuracy by 2.7% compared to the E2E baseline on ImageNet-1K, while saving the training cost by 43% in the meantime. Code is available at https://github.com/LeapLabTHU/Model-Assembling.
翻译:然而,培训大型模型通常具有挑战性,例如,由于计算成本高、不稳和痛苦缓慢的优化程序以及容易过度适应等原因,培训大型模型通常具有挑战性。为了缓解这些问题,这项工作研究一个分而解的战略,即将一个大模型分成小模块,独立培训它们,并重新组装经过训练的模块以获得目标模型。这一方法很有希望,因为它避免直接从零开始培训大型模型。然而,实施这一理念并不具有挑战性,因为难以确保独立培训模块的兼容性。在本文件中,我们提出了一个解决这一问题的优雅解决方案,即我们引入了一个全球性的、共享的元模型,将所有模块联结起来。这使我们能够培训高度兼容的模块,这些模块在组合成小模块时能够有效地协作。我们进一步提议一个模块,使元模型能够从零开始直接培训。因此,由于难以确保独立培训模块的兼容性。在概念上简单易行,我们采用的方法在成本2-成本2 上大大超标度,同时将E- 格式培训在成本2 上进行最高级的精确度上进行。