GADGET: " 环-环-全带-全带-全带-学习工作安排 " 在线资源优化 (GADGET: Online Resource Optimization for Scheduling Ring-All-Reduce Learning Jobs) - 专知论文

会员服务 ·

0

Performer · 优化器 · Extensibility · 学成 · MoDELS ·

2022 年 2 月 2 日

GADGET: Online Resource Optimization for Scheduling Ring-All-Reduce Learning Jobs

翻译：GADGET: " 环-环-全带-全带-全带-学习工作安排 " 在线资源优化

Menglu Yu,Ye Tian,Bo Ji,Chuan Wu,Hridesh Rajan,Jia Liu

from arxiv, Accepted in Proc. IEEE INFOCOM, Virtual Event, May 2022

Fueled by advances in distributed deep learning (DDL), recent years have witnessed a rapidly growing demand for resource-intensive distributed/parallel computing to process DDL computing jobs. To resolve network communication bottleneck and load balancing issues in distributed computing, the so-called ``ring-all-reduce'' decentralized architecture has been increasingly adopted to remove the need for dedicated parameter servers. To date, however, there remains a lack of theoretical understanding on how to design resource optimization algorithms for efficiently scheduling ring-all-reduce DDL jobs in computing clusters. This motivates us to fill this gap by proposing a series of new resource scheduling designs for ring-all-reduce DDL jobs. Our contributions in this paper are three-fold: i) We propose a new resource scheduling analytical model for ring-all-reduce deep learning, which covers a wide range of objectives in DDL performance optimization (e.g., excessive training avoidance, energy efficiency, fairness); ii) Based on the proposed performance analytical model, we develop an efficient resource scheduling algorithm called GADGET (greedy ring-all-reduce distributed graph embedding technique), which enjoys a provable strong performance guarantee; iii) We conduct extensive trace-driven experiments to demonstrate the effectiveness of the GADGET approach and its superiority over the state of the art.

翻译：在分布式深层学习(DDL)的进步推动下,近年来对资源密集型分布式/平行计算处理DDL计算工作的需求迅速增长。为了解决网络通信瓶颈和分配式计算中的平衡问题,所谓的“环-全编辑”的分散化结构日益被采用,以消除对专用参数服务器的需求。然而,迄今为止,对于如何设计资源优化算法以有效安排计算组中的环-全减少的DDL工作,仍然缺乏理论上的理解。这促使我们通过提出一系列环-全编辑-DDL工作的新资源时间安排设计来填补这一差距。我们在本文件中的贡献有三重:一)我们提出了一个新的环-全环-全编辑的分散式结构,用于消除对专用参数服务器的需求。但迄今为止,对于如何设计资源优化资源优化资源优化算法以高效地安排计算计算计算在计算组中的环-全压缩DL工作。这促使我们提出了一系列新的资源时间安排设计。我们在本文中的贡献有三重三重:我们提出了一个新的资源配置式分析模型分析模型分析模型,其中包括DL业绩优化的大规模追踪方法。

0

相关内容

Performer

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

多层时空并行 Schwarz 算法的研究

国家自然科学基金

3+阅读 · 2017年12月31日

基于VSC的多端直流系统故障定位方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

非线性对称锥规划的同伦算法及应用

国家自然科学基金

0+阅读 · 2013年12月31日

非线性系统全局输出反馈：镇定、跟踪和应用

国家自然科学基金

0+阅读 · 2012年12月31日

复杂随机动态系统可靠性建模与分析

国家自然科学基金

1+阅读 · 2012年12月31日

数据驱动的非线性多模态复杂系统性能退化故障预测方法研究

国家自然科学基金

6+阅读 · 2012年12月31日

复杂大化工过程的分布式广义预测控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

成体神经干细胞静息和激活的REST和miRNA17-92负反馈调控

国家自然科学基金

0+阅读 · 2012年12月31日

多层结构过程控制系统性能实时监控、评估与优化

国家自然科学基金

0+阅读 · 2011年12月31日

复杂系统多模态故障诊断与健康管理的数据驱动方法

国家自然科学基金

6+阅读 · 2011年12月31日

Online Caching with Optimistic Learning

Arxiv

1+阅读 · 2022年4月20日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Scalable Verification of GNN-based Job Schedulers

Arxiv

0+阅读 · 2022年4月19日

Expert-Calibrated Learning for Online Optimization with Switching Costs

Arxiv

0+阅读 · 2022年4月18日

How to Attain Communication-Efficient DNN Training? Convert, Compress, Correct

Arxiv

0+阅读 · 2022年4月18日

Server Free Wireless Federated Learning: Architecture, Algorithm, and Analysis

Arxiv

0+阅读 · 2022年4月15日

Deep Learning-based List Sphere Decoding for Faster-than-Nyquist (FTN) Signaling Detection

Deep Learning-based List Sphere Decoding for Faster-than-Nyquist (FTN) Signaling Detection

Arxiv

0+阅读 · 2022年4月15日

Multiplier with Reduced Activities and Minimized Interconnect for Inner Product Arrays

Arxiv

0+阅读 · 2022年4月11日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Introduction to Online Convex Optimization

Arxiv

23+阅读 · 2021年12月19日

VIP会员

文章信息

相关主题

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

NeurIPS 2025 | 自动化所新作速览（一）

大型语言模型（LLM）赋能的知识图谱构建：综述

NeurIPS 2025 | 自动化所新作速览（二）

领域特定文本分类中的预训练语言模型新进展：系统综述

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Online Caching with Optimistic Learning

Arxiv

1+阅读 · 2022年4月20日

Memory-Constrained Policy Optimization

Arxiv

0+阅读 · 2022年4月20日

Scalable Verification of GNN-based Job Schedulers

Arxiv

0+阅读 · 2022年4月19日

Expert-Calibrated Learning for Online Optimization with Switching Costs

Arxiv

0+阅读 · 2022年4月18日

How to Attain Communication-Efficient DNN Training? Convert, Compress, Correct

Arxiv

0+阅读 · 2022年4月18日

Server Free Wireless Federated Learning: Architecture, Algorithm, and Analysis

Arxiv

0+阅读 · 2022年4月15日

Deep Learning-based List Sphere Decoding for Faster-than-Nyquist (FTN) Signaling Detection

Deep Learning-based List Sphere Decoding for Faster-than-Nyquist (FTN) Signaling Detection

Arxiv

0+阅读 · 2022年4月15日

Multiplier with Reduced Activities and Minimized Interconnect for Inner Product Arrays

Arxiv

0+阅读 · 2022年4月11日

Bayesian Deep Learning for Graphs

Arxiv

23+阅读 · 2022年2月24日

Introduction to Online Convex Optimization

Arxiv

23+阅读 · 2021年12月19日

相关基金

多层时空并行 Schwarz 算法的研究

国家自然科学基金

3+阅读 · 2017年12月31日

基于VSC的多端直流系统故障定位方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

非线性对称锥规划的同伦算法及应用

国家自然科学基金

0+阅读 · 2013年12月31日

非线性系统全局输出反馈：镇定、跟踪和应用

国家自然科学基金

0+阅读 · 2012年12月31日

复杂随机动态系统可靠性建模与分析

国家自然科学基金

1+阅读 · 2012年12月31日

数据驱动的非线性多模态复杂系统性能退化故障预测方法研究

国家自然科学基金

6+阅读 · 2012年12月31日

复杂大化工过程的分布式广义预测控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

成体神经干细胞静息和激活的REST和miRNA17-92负反馈调控

国家自然科学基金

0+阅读 · 2012年12月31日

多层结构过程控制系统性能实时监控、评估与优化

国家自然科学基金

0+阅读 · 2011年12月31日

复杂系统多模态故障诊断与健康管理的数据驱动方法

国家自然科学基金

6+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员