dPRO: 加速分发DNN培训通用分类和优化系统 (dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training) - 专知论文

会员服务 ·

0

Performer · 优化器 · DNN · 可辨认的 · Extensibility ·

2022 年 5 月 18 日

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

翻译：dPRO: 加速分发DNN培训通用分类和优化系统

Hanpeng Hu,Chenyu Jiang,Yuchen Zhong,Yanghua Peng,Chuan Wu,Yibo Zhu,Haibin Lin,Chuanxiong Guo

from arxiv, Accepted by MLSys 2022

Distributed training using multiple devices (e.g., GPUs) has been widely adopted for learning DNN models over large datasets. However, the performance of large-scale distributed training tends to be far from linear speed-up in practice. Given the complexity of distributed systems, it is challenging to identify the root cause(s) of inefficiency and exercise effective performance optimizations when unexpected low training speed occurs. To date, there exists no software tool which diagnoses performance issues and helps expedite distributed DNN training, while the training can be run using different deep learning frameworks. This paper proposes dPRO, a toolkit that includes: (1) an efficient profiler that collects runtime traces of distributed DNN training across multiple frameworks, especially fine-grained communication traces, and constructs global data flow graphs including detailed communication operations for accurate replay; (2) an optimizer that effectively identifies performance bottlenecks and explores optimization strategies (from computation, communication, and memory aspects) for training acceleration. We implement dPRO on multiple deep learning frameworks (TensorFlow, MXNet) and representative communication schemes (AllReduce and Parameter Server). Extensive experiments show that dPRO predicts the performance of distributed training in various settings with < 5% errors in most cases and finds optimization strategies with up to 3.48x speed-up over the baselines.

翻译：利用多种装置(例如,GPUs)广泛采用分布式培训,在大型数据集中学习DNN模型(例如,GPUs),广泛采用分布式培训,在大型数据集中学习DNN模型;然而,大规模分布式培训的绩效往往远非线性加速,鉴于分布式系统的复杂性,在出现意外低培训速度时,查明效率低下的根源和实行有效绩效优化是具有挑战性的;迄今为止,尚没有任何软件工具来分析业绩问题,帮助加快分布式DNN培训,而培训则可以使用不同的深层次学习框架进行;本文件提议DPRO,这是一个工具包,包括:(1) 一个高效的剖析器,收集分布式DNN培训在多个框架中的运行时间痕迹,特别是细微的通信痕迹,并构建全球数据流图,包括详细的通信操作,以便准确重现;(2) 一个优化软件,以有效确定业绩瓶颈,探索优化战略(从计算、通信和记忆方面),以加快培训速度。我们实施了多深层次学习框架(Tensorflow、MXNet)和具有代表性的通信计划(Alled and Parametermetrimed-listations),3。

0

相关内容

Performer

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

77+阅读 · 2022年3月15日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

79+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

157+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

180+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

面向多核DSP的实时视频并行编码关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

Progerin/PrelaminA诱发早老症的蛋白质组学研究

国家自然科学基金

1+阅读 · 2015年12月31日

支持PDE存储的安全增强型Android系统

国家自然科学基金

0+阅读 · 2015年12月31日

GTAT4和Myocardin相互作用调控心肌肥厚

国家自然科学基金

0+阅读 · 2014年12月31日

E级计算系统中基于SSD的高性能高可靠IO节点子系统关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于协方差理论的UCT动态关联算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于GPU平台的HEVC并行编码算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

面向多核处理器的硬软件协作Transactional Memory系统结构

国家自然科学基金

0+阅读 · 2008年12月31日

Adaptive Personlization in Federated Learning for Highly Non-i.i.d. Data

Adaptive Personlization in Federated Learning for Highly Non-i.i.d. Data

Arxiv

0+阅读 · 2022年7月7日

Sociable and Ergonomic Human-Robot Collaboration through Action Recognition and Augmented Hierarchical Quadratic Programming

Sociable and Ergonomic Human-Robot Collaboration through Action Recognition and Augmented Hierarchical Quadratic Programming

Arxiv

0+阅读 · 2022年7月7日

A simple normalization technique using window statistics to improve the out-of-distribution generalization in medical images

Arxiv

0+阅读 · 2022年7月7日

NeuralPassthrough: Learned Real-Time View Synthesis for VR

Arxiv

0+阅读 · 2022年7月5日

TopoOpt: Optimizing the Network Topology for Distributed DNN Training

Arxiv

0+阅读 · 2022年7月5日

MVP: Robust Multi-View Practice for Driving Action Localization

Arxiv

0+阅读 · 2022年7月5日

A Cross-City Federated Transfer Learning Framework: A Case Study on Urban Region Profiling

Arxiv

0+阅读 · 2022年7月5日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Distributed Graph Convolutional Networks

Arxiv

19+阅读 · 2020年7月13日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

77+阅读 · 2022年3月15日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

79+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

157+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

180+阅读 · 2019年10月11日

2019年机器学习框架回顾

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

不可错过！EPFL《训练大语言模型》课程

《大型推理模型的安全性：综述》

未来人工智能趋势研判中国人工智能行业大模型应用实践与展望

【博士论文】《自然语言处理中的因果推理》

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

17+阅读 · 2018年12月24日

相关论文

Adaptive Personlization in Federated Learning for Highly Non-i.i.d. Data

Adaptive Personlization in Federated Learning for Highly Non-i.i.d. Data

Arxiv

0+阅读 · 2022年7月7日

Sociable and Ergonomic Human-Robot Collaboration through Action Recognition and Augmented Hierarchical Quadratic Programming

Sociable and Ergonomic Human-Robot Collaboration through Action Recognition and Augmented Hierarchical Quadratic Programming

Arxiv

0+阅读 · 2022年7月7日

A simple normalization technique using window statistics to improve the out-of-distribution generalization in medical images

Arxiv

0+阅读 · 2022年7月7日

NeuralPassthrough: Learned Real-Time View Synthesis for VR

Arxiv

0+阅读 · 2022年7月5日

TopoOpt: Optimizing the Network Topology for Distributed DNN Training

Arxiv

0+阅读 · 2022年7月5日

MVP: Robust Multi-View Practice for Driving Action Localization

Arxiv

0+阅读 · 2022年7月5日

A Cross-City Federated Transfer Learning Framework: A Case Study on Urban Region Profiling

Arxiv

0+阅读 · 2022年7月5日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

Distributed Graph Convolutional Networks

Arxiv

19+阅读 · 2020年7月13日

相关基金

面向多核DSP的实时视频并行编码关键技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

Progerin/PrelaminA诱发早老症的蛋白质组学研究

国家自然科学基金

1+阅读 · 2015年12月31日

支持PDE存储的安全增强型Android系统

国家自然科学基金

0+阅读 · 2015年12月31日

GTAT4和Myocardin相互作用调控心肌肥厚

国家自然科学基金

0+阅读 · 2014年12月31日

E级计算系统中基于SSD的高性能高可靠IO节点子系统关键技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

基于协方差理论的UCT动态关联算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于GPU平台的HEVC并行编码算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

面向多核处理器的硬软件协作Transactional Memory系统结构

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员