Embrace:加速为NLP神经网络的分布式培训提供粗散通信 (EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks) - 专知论文

会员服务 ·

0

稀疏 · MoDELS · Neural Networks · Networking · NLP ·

2022 年 6 月 27 日

EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks

翻译：Embrace:加速为NLP神经网络的分布式培训提供粗散通信

Shengwei Li,Zhiquan Lai,Dongsheng Li,Yiming Zhang,Xiangyu Ye,Yabo Duan

Distributed data-parallel training has been widely adopted for deep neural network (DNN) models. Although current deep learning (DL) frameworks scale well for dense models like image classification models, we find that these DL frameworks have relatively low scalability for sparse models like natural language processing (NLP) models that have highly sparse embedding tables. Most existing works overlook the sparsity of model parameters thus suffering from significant but unnecessary communication overhead. In this paper, we propose EmbRace, an efficient communication framework to accelerate communications of distributed training for sparse models. EmbRace introduces Sparsity-aware Hybrid Communication, which integrates AlltoAll and model parallelism into data-parallel training, so as to reduce the communication overhead of highly sparse parameters. To effectively overlap sparse communication with both backward and forward computation, EmbRace further designs a 2D Communication Scheduling approach which optimizes the model computation procedure, relaxes the dependency of embeddings, and schedules the sparse communications of each embedding row with a priority queue. We have implemented a prototype of EmbRace based on PyTorch and Horovod, and conducted comprehensive evaluations with four representative NLP models. Experimental results show that EmbRace achieves up to 2.41X speedup compared to the state-of-the-art distributed training baselines.

翻译：在深神经网络(DNN)模型中,广泛采用了分布式数据的培训。虽然目前深层次的学习(DL)框架对于像图像分类模型这样的密集模型来说规模良好,但我们认为,这些DL框架对于自然语言处理(NLP)模型等分散模型具有相对较低的伸缩性,这些模型的嵌入式表非常稀少。大多数现有工作忽略了模型参数的广度,从而遭受大量但不必要的通信间接费用。在本文件中,我们提议EmbRace,这是一个高效的通信框架,用于加快对稀有模型分布式培训的沟通。EmbRace引入了Sparsity-aware混合通信,将AlltoAll和模型平行通信纳入数据单词培训,从而减少高度分散参数的通信管理。为了有效地将稀释式通信与前向和后向计算器两种计算方法相重叠,EmbRace进一步设计了2D通信调度方法,优化模型的嵌入依赖,并将每个嵌入行的稀散通信与优先排安排。我们实施了基于PyTollerch和模型的EmbX模型的Emb-realtravely-travely-traviewal lagyal lacuildal to sal superational to supertural superdustral superdudual subal to subal subal to subaldaldaldaldaldaldaldald.

0

相关内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【图深度学习GDL论文大全】A comprehensive collection of recent papers on graph deep learning

【图深度学习GDL论文大全】A comprehensive collection of recent papers on graph deep learning

专知会员服务

47+阅读 · 2019年12月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

TGM2在活化肝星状细胞诱导肝癌细胞糖代谢重构中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

γ-Synuclein调控MAPK-ERK-JNK信号通路及细胞周期促进子宫内膜癌恶性进展的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

求解非线性方程的加速迭代算法

国家自然科学基金

0+阅读 · 2014年12月31日

钛基生物材料微图形多功能生物活性表面与人体血液及内皮细胞的相互作用机制与调控

国家自然科学基金

0+阅读 · 2014年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

一些q-特殊函数的研究

国家自然科学基金

0+阅读 · 2012年12月31日

衰老相关的LOX-1基因表达下调与心肌纤维化

国家自然科学基金

0+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

流体动力学若干模型的定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

高灵敏FRET显微成像技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

Boosting Distributed Training Performance of the Unpadded BERT Model

Arxiv

0+阅读 · 2022年8月17日

Resource Allocation in Quantum Key Distribution (QKD) for Space-Air-Ground Integrated Networks

Arxiv

0+阅读 · 2022年8月17日

GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing

Arxiv

0+阅读 · 2022年8月16日

Debiased Recommendation with Neural Stratification

Debiased Recommendation with Neural Stratification

Arxiv

0+阅读 · 2022年8月15日

Bias amplification in experimental social networks is reduced by resampling

Bias amplification in experimental social networks is reduced by resampling

Arxiv

0+阅读 · 2022年8月15日

Efficient Transmission and Reconstruction of Dependent Data Streams via Edge Sampling

Arxiv

0+阅读 · 2022年8月12日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

VIP会员

文章信息

相关主题

Neural Networks

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【图深度学习GDL论文大全】A comprehensive collection of recent papers on graph deep learning

【图深度学习GDL论文大全】A comprehensive collection of recent papers on graph deep learning

专知会员服务

47+阅读 · 2019年12月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能技术提升军事不确定性环境下领导决策能力研究》180页

以机器速度锁定目标：人工智能的能力与局限

中文版 | 革新国家安全：国防情报离线本地部署大语言模型

《美军21世纪医疗抵消战略》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Boosting Distributed Training Performance of the Unpadded BERT Model

Arxiv

0+阅读 · 2022年8月17日

Resource Allocation in Quantum Key Distribution (QKD) for Space-Air-Ground Integrated Networks

Arxiv

0+阅读 · 2022年8月17日

GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing

Arxiv

0+阅读 · 2022年8月16日

Debiased Recommendation with Neural Stratification

Debiased Recommendation with Neural Stratification

Arxiv

0+阅读 · 2022年8月15日

Bias amplification in experimental social networks is reduced by resampling

Bias amplification in experimental social networks is reduced by resampling

Arxiv

0+阅读 · 2022年8月15日

Efficient Transmission and Reconstruction of Dependent Data Streams via Edge Sampling

Arxiv

0+阅读 · 2022年8月12日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks

Arxiv

13+阅读 · 2020年6月24日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

相关基金

TGM2在活化肝星状细胞诱导肝癌细胞糖代谢重构中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

γ-Synuclein调控MAPK-ERK-JNK信号通路及细胞周期促进子宫内膜癌恶性进展的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

求解非线性方程的加速迭代算法

国家自然科学基金

0+阅读 · 2014年12月31日

钛基生物材料微图形多功能生物活性表面与人体血液及内皮细胞的相互作用机制与调控

国家自然科学基金

0+阅读 · 2014年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

一些q-特殊函数的研究

国家自然科学基金

0+阅读 · 2012年12月31日

衰老相关的LOX-1基因表达下调与心肌纤维化

国家自然科学基金

0+阅读 · 2012年12月31日

Eulerian bond-cubic 模型渗流性质的数值研究

国家自然科学基金

0+阅读 · 2012年12月31日

流体动力学若干模型的定性研究

国家自然科学基金

0+阅读 · 2011年12月31日

高灵敏FRET显微成像技术研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员