dPRO: 加速分发DNN培训通用分类和优化系统 (dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training) - 专知论文

会员服务 ·

0

Performer · 优化器 · DNN · 可辨认的 · Extensibility ·

2022 年 5 月 5 日

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

翻译：dPRO: 加速分发DNN培训通用分类和优化系统

Hanpeng Hu,Chenyu Jiang,Yuchen Zhong,Yanghua Peng,Chuan Wu,Yibo Zhu,Haibin Lin,Chuanxiong Guo

from arxiv, Accepted by MLSys 2022

Distributed training using multiple devices (i.e., GPU servers) has been widely adopted for learning DNN models over large datasets. However, the performance of large-scale distributed training tends to be far from linear speed-up in practice. Given the complexity of distributed systems, it is challenging to identify the root cause(s) of inefficiency and exercise effective performance optimizations when unexpected low training speed occurs. To date, there exists no software tool which diagnoses performance issues and helps expedite distributed DNN training, while the training can be run using different machine learning frameworks. This paper proposes dPRO, a toolkit that includes: (1) an efficient profiler that collects runtime traces of distributed DNN training across multiple frameworks, especially fine-grained communication traces, and constructs global data flow graphs including detailed communication operations for accurate replay; (2) an optimizer that effectively identifies performance bottlenecks and explores optimization strategies (from computation, communication and memory aspects) for training acceleration. We implement dPRO on multiple deep learning frameworks (PyTorch, TensorFlow, MXNet) and representative communication schemes (AllReduce and Parameter Server architecture). Extensive experiments show that dPRO predicts performance of distributed training in various settings with<5% errors in most cases and finds optimization strategies with up to87.1%speed-up over the baselines.

翻译：使用多种设备(即, GPU 服务器)进行分布式培训,以在大型数据集中学习 DNN 模型;然而,大规模分布式培训的绩效往往远非在线速度的实际速度。鉴于分布式系统的复杂性,在出现意想不到的低培训速度时,查明效率低下的根源和行使有效的绩效优化是具有挑战性的。到目前为止,还没有软件工具来分析绩效问题并帮助加快分布式DNN培训,而培训可以使用不同的机器学习框架来运行。本文提议DPRO,这是一个工具包,包括:(1) 一个高效的配置器,收集分布式DNN培训在多个框架的运行时间痕迹,特别是细微的通信跟踪,并构建全球数据流图,包括详细的通信操作,以便准确重播;(2) 一个优化器,有效确定绩效瓶颈,探索优化战略(从计算、通信和记忆方面),以加快培训速度。我们实施了多深度学习框架(PyTorrch, TensorFlow, MXNet)和具有代表性的通信计划(Alledive and Paramedrol) imal asublegressolutions asultations overs of asublistraplistraplistraplistraplistraplistrations daslevations.

0

相关内容

Performer

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

中国田鼠亚科 Microtini族(Rodentia: Cricetidae: Arvicolinae)的分类与系统发育研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

主动脉瓣反流瓣膜置换术后左室收缩功能转归: 三维斑点追踪超声评价与miRNA检测的同步研究

国家自然科学基金

0+阅读 · 2013年12月31日

高光谱分辨率氧气A吸收带地表气压和气溶胶廓线反演研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向负载跟踪的外重整固体氧化物燃料电池系统热管理与高效率协同控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

能量耦合因子转运蛋白结构与功能的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

低温基质隔离红外光谱研究硅-过渡金属氢桥键

国家自然科学基金

0+阅读 · 2012年12月31日

锡林郭勒草地排放的挥发性有机物对大气环境的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

柴油机尾气排放NOx-PM-HC-CO污染物耦合催化去除的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Data Augmentation Approaches in Natural Language Processing: A Survey

Arxiv

0+阅读 · 2022年6月27日

EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks

Arxiv

0+阅读 · 2022年6月27日

An Automated Deployment and Testing Framework for Resilient Distributed Smart Grid Applications

Arxiv

0+阅读 · 2022年6月27日

Towards Sparse Federated Analytics: Location Heatmaps under Distributed Differential Privacy with Secure Aggregation

Arxiv

0+阅读 · 2022年6月26日

Multi-Access Distributed Computing

Arxiv

0+阅读 · 2022年6月26日

From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization

Arxiv

0+阅读 · 2022年6月25日

Exploring System Performance of Continual Learning for Mobile and Embedded Sensing Applications

Arxiv

0+阅读 · 2022年6月23日

Flashlight: Enabling Innovation in Tools for Machine Learning

Arxiv

1+阅读 · 2022年6月23日

Distributed Graph Convolutional Networks

Arxiv

19+阅读 · 2020年7月13日

Advances and Open Problems in Federated Learning

Advances and Open Problems in Federated Learning

Arxiv

18+阅读 · 2019年12月10日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Call for Nominations: 2022 Multimedia Prize Paper Award

Call for Nominations: 2022 Multimedia Prize Paper Award

CCF多媒体专委会

0+阅读 · 2022年2月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Data Augmentation Approaches in Natural Language Processing: A Survey

Arxiv

0+阅读 · 2022年6月27日

EmbRace: Accelerating Sparse Communication for Distributed Training of NLP Neural Networks

Arxiv

0+阅读 · 2022年6月27日

An Automated Deployment and Testing Framework for Resilient Distributed Smart Grid Applications

Arxiv

0+阅读 · 2022年6月27日

Towards Sparse Federated Analytics: Location Heatmaps under Distributed Differential Privacy with Secure Aggregation

Arxiv

0+阅读 · 2022年6月26日

Multi-Access Distributed Computing

Arxiv

0+阅读 · 2022年6月26日

From Simulated Mixtures to Simulated Conversations as Training Data for End-to-End Neural Diarization

Arxiv

0+阅读 · 2022年6月25日

Exploring System Performance of Continual Learning for Mobile and Embedded Sensing Applications

Arxiv

0+阅读 · 2022年6月23日

Flashlight: Enabling Innovation in Tools for Machine Learning

Arxiv

1+阅读 · 2022年6月23日

Distributed Graph Convolutional Networks

Arxiv

19+阅读 · 2020年7月13日

Advances and Open Problems in Federated Learning

Advances and Open Problems in Federated Learning

Arxiv

18+阅读 · 2019年12月10日

相关基金

中国田鼠亚科 Microtini族(Rodentia: Cricetidae: Arvicolinae)的分类与系统发育研究

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

主动脉瓣反流瓣膜置换术后左室收缩功能转归: 三维斑点追踪超声评价与miRNA检测的同步研究

国家自然科学基金

0+阅读 · 2013年12月31日

高光谱分辨率氧气A吸收带地表气压和气溶胶廓线反演研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向负载跟踪的外重整固体氧化物燃料电池系统热管理与高效率协同控制研究

国家自然科学基金

0+阅读 · 2013年12月31日

能量耦合因子转运蛋白结构与功能的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

低温基质隔离红外光谱研究硅-过渡金属氢桥键

国家自然科学基金

0+阅读 · 2012年12月31日

锡林郭勒草地排放的挥发性有机物对大气环境的影响研究

国家自然科学基金

0+阅读 · 2012年12月31日

柴油机尾气排放NOx-PM-HC-CO污染物耦合催化去除的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员