鲸:向三亿万亿万亿万富翁推广深学习模式培训 (Whale: Scaling Deep Learning Model Training to the Trillions)

Scaling up deep neural networks has been proven effective in improving model quality, while it also brings ever-growing training challenges. This paper presents Whale, an automatic and hardware-aware distributed training framework for giant models. Whale generalizes the expression of parallelism with four primitives, which can define various parallel strategies, as well as flexible hybrid strategies including combination and nesting patterns. It allows users to build models at an arbitrary scale by adding a few annotations and automatically transforms the local model to a distributed implementation. Moreover, Whale is hardware-aware and highly efficient even when training on GPUs of mixed types, which meets the growing demand of heterogeneous training in industrial clusters. Whale sets a milestone for training the largest multimodal pretrained model M6. The success of M6 is achieved by Whale's design to decouple algorithm modeling from system implementations, i.e., algorithm developers can focus on model innovation, since it takes only three lines of code to scale the M6 model to trillions of parameters on a cluster of 480 GPUs.

翻译：提高深层神经网络在提高模型质量方面已证明是有效的,同时也带来了不断增长的培训挑战。本文件展示了鲸鱼,这是一个针对巨型模型的自动和硬件分布式培训框架。鲸鱼概括地展示了与四个原始生物平行的表达方式,这可以确定各种平行战略,以及灵活的混合战略,包括组合和筑巢模式。它允许用户通过增加几个说明,任意地建立模型,并将本地模型自动转换为分布式实施。此外,鲸鱼是硬件,而且即使在关于混合类型GPU的培训满足工业集群中多样化培训日益增长的需求时,也非常高效。鲸鱼为培训最大的多式预先培训模式M6树立了一个里程碑。M6的M6成功通过捕鲸设计从系统实施中分离算法模型,即算法开发者可以侧重于模型创新,因为将M6模型模型规模从480个组合的数万亿个参数缩小只需要三行代码。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/