EnergonAI:10-100亿参数变形模型的推断系统 (EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models) - 专知论文

会员服务 ·

0

Transformer模型 · Performer · MoDELS · Tensor · 变换 ·

2022 年 9 月 6 日

EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models

翻译：EnergonAI:10-100亿参数变形模型的推断系统

Jiangsu Du,Ziming Liu,Jiarui Fang,Shenggui Li,Yongbin Li,Yutong Lu,Yang You

Large transformer models display promising performance on a wide range of natural language processing (NLP) tasks. Although the AI community has expanded the model scale to the trillion parameter level, the practical deployment of 10-100 billion parameter models is still uncertain due to the latency, throughput, and memory constraints. In this paper, we proposed EnergonAI to solve the challenges of the efficient deployment of 10-100 billion parameter transformer models on single- or multi-GPU systems. EnergonAI adopts a hierarchy-controller system architecture to coordinate multiple devices and efficiently support different parallel patterns. It delegates the execution of sub-models to multiple workers in the single-controller style and applies tensor parallelism and pipeline parallelism among the workers in a multi-controller style. Upon the novel architecture, we propose three techniques, i.e. non-blocking pipeline parallelism, distributed redundant computation elimination, and peer memory pooling. EnergonAI enables the users to program complex parallel code the same as a serial one. Compared with the FasterTransformer, we have proven that EnergonAI has superior performance on latency and throughput. In our experiments, EnergonAI can achieve 37% latency reduction in tensor parallelism, 10% scalability improvement in pipeline parallelism, and it improves the model scale inferred on a single GPU by using a larger heterogeneous memory space at cost of limited performance reduction.

翻译：大型变压器模型在一系列广泛的自然语言处理(NLP)任务中表现出很有希望的性能。尽管AI社区已将模型规模扩大到万亿参数水平,但实际部署10千亿参数模型仍然不确定,因为存在延迟、输送量和内存限制。在本文件中,我们建议EnergonAI解决在单一或多GPU系统中高效部署10千亿参数变压器模型的挑战。EnergonAI采用一个等级控制器系统架构来协调多个装置并有效支持不同的平行模式。它代表了在单一控制器风格下对多个工人实施子模型,并在多控制器风格下对工人应用了多平行和平行模式。在新的结构中,我们提出了三种技术,即无阻管道平行、分散冗余计算和同行记忆集合。EnergonagonAI使用户能够像一个序列一样编程复杂的平行代码。与“快速变换”相比,我们已证明EnergongonatAI在弹性和超载性操作和超载性工作方面表现了在多控控控器模式和超载模式模式模式上,在更大程度的GRO性实验中改进了GRO- 递增缩性实验中,在10 % 递减压中,在10号上可以实现递增压的递增压。

0

相关内容

Transformer模型

Transformer模型

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

半量子计算模型与密码通信中的若干问题

国家自然科学基金

1+阅读 · 2012年12月31日

与微流控集成的锥形纳米孔在电生命分析化学中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

高度有序多色量子点阵列制备方法的探索

国家自然科学基金

0+阅读 · 2012年12月31日

基于糖化合物“Ferrier Carbocyclization”汞离子荧光探针的设计、合成及性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

荧光/核磁共振/拉曼多模态成像碳点的构建及应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

荧光信号释放受分子逻辑门机制控制的新型靶向智能探针及荧光标记成像

国家自然科学基金

0+阅读 · 2012年12月31日

两步合成法制备Zn系量子点荧光探针并应用于重金属离子检测

国家自然科学基金

0+阅读 · 2011年12月31日

高电荷态离子出射纳米微孔电荷态分布动态演化研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于纳米颗粒-细胞DNA适配子方法检测体外CTC预测肝癌微转移的研究

国家自然科学基金

0+阅读 · 2009年12月31日

面向新一代生物芯片的多组分生物分子纳米阵列的构建

国家自然科学基金

0+阅读 · 2009年12月31日

Efficient Diffusion Models for Vision: A Survey

Arxiv

3+阅读 · 2022年10月20日

Braille Letter Reading: A Benchmark for Spatio-Temporal Pattern Recognition on Neuromorphic Hardware

Arxiv

0+阅读 · 2022年10月19日

Over-the-Air Computation: Foundations, Technologies, and Applications

Arxiv

0+阅读 · 2022年10月19日

Forging Multiple Training Objectives for Pre-trained Language Models via Meta-Learning

Arxiv

0+阅读 · 2022年10月19日

Continued Pretraining for Better Zero- and Few-Shot Promptability

Arxiv

0+阅读 · 2022年10月19日

A double-parameter robust lower order mixed element method for a strain gradient elastic model

Arxiv

0+阅读 · 2022年10月18日

SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters

Arxiv

0+阅读 · 2022年10月18日

Personalization of CTC Speech Recognition Models

Arxiv

0+阅读 · 2022年10月18日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

VIP会员

文章信息

相关主题

Transformer模型

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

新质生成式AI赋能产业变革的实践与路径

用于多模态大模型的离散标记化：全面综述

Nature综述：金融网络中的物理学

【CMU博士论文】通信高效且差分隐私的优化方法

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

相关论文

Efficient Diffusion Models for Vision: A Survey

Arxiv

3+阅读 · 2022年10月20日

Braille Letter Reading: A Benchmark for Spatio-Temporal Pattern Recognition on Neuromorphic Hardware

Arxiv

0+阅读 · 2022年10月19日

Over-the-Air Computation: Foundations, Technologies, and Applications

Arxiv

0+阅读 · 2022年10月19日

Forging Multiple Training Objectives for Pre-trained Language Models via Meta-Learning

Arxiv

0+阅读 · 2022年10月19日

Continued Pretraining for Better Zero- and Few-Shot Promptability

Arxiv

0+阅读 · 2022年10月19日

A double-parameter robust lower order mixed element method for a strain gradient elastic model

Arxiv

0+阅读 · 2022年10月18日

SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters

Arxiv

0+阅读 · 2022年10月18日

Personalization of CTC Speech Recognition Models

Arxiv

0+阅读 · 2022年10月18日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

相关基金

半量子计算模型与密码通信中的若干问题

国家自然科学基金

1+阅读 · 2012年12月31日

与微流控集成的锥形纳米孔在电生命分析化学中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

高度有序多色量子点阵列制备方法的探索

国家自然科学基金

0+阅读 · 2012年12月31日

基于糖化合物“Ferrier Carbocyclization”汞离子荧光探针的设计、合成及性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

荧光/核磁共振/拉曼多模态成像碳点的构建及应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

荧光信号释放受分子逻辑门机制控制的新型靶向智能探针及荧光标记成像

国家自然科学基金

0+阅读 · 2012年12月31日

两步合成法制备Zn系量子点荧光探针并应用于重金属离子检测

国家自然科学基金

0+阅读 · 2011年12月31日

高电荷态离子出射纳米微孔电荷态分布动态演化研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于纳米颗粒-细胞DNA适配子方法检测体外CTC预测肝癌微转移的研究

国家自然科学基金

0+阅读 · 2009年12月31日

面向新一代生物芯片的多组分生物分子纳米阵列的构建

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员