Chimera:有效培训使用双向管道的大型神经网络 (Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines)

from arxiv, The paper was accepted by the 2021 International Conference for High Performance Computing, Networking, Storage and Analysis (SC'21), in Best Paper Finalist

Training large deep learning models at scale is very challenging. This paper proposes Chimera, a novel pipeline parallelism scheme which combines bidirectional pipelines for efficiently training large-scale models. Chimera is a synchronous approach and therefore no loss of accuracy, which is more convergence-friendly than asynchronous approaches. Compared with the latest synchronous pipeline approach, Chimera reduces the number of bubbles by up to 50%; benefiting from the sophisticated scheduling of bidirectional pipelines, Chimera has a more balanced activation memory consumption. Evaluations are conducted on Transformer based language models. For a GPT-2 model with 1.3 billion parameters running on 2,048 GPU nodes of the Piz Daint supercomputer, Chimera improves the training throughput by 1.16x-2.34x over the state-of-the-art synchronous and asynchronous pipeline approaches.

翻译：大规模培训大型深层学习模式非常具有挑战性。本文提出了齐梅拉(Chimera),这是一个将双向管道联合起来,高效培训大型模型的新型管道平行计划。齐梅拉是一个同步方法,因此准确性不会丧失,这比非同步方法更有利于趋同。与最新的同步管道方法相比,奇梅拉将泡沫数量减少高达50%;从双向管道的复杂时间安排中获益,奇梅拉拥有更平衡的激活记忆消耗。以变换器为基础的语言模型进行了评估。对于GPT-2模型,其13亿参数运行在皮兹·丹特超级计算机的2 048 GPU节点上,奇梅拉将培训吞吐量增加了1. 16x 2. 34x, 用于最新同步和无同步管道方法。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日