云层深学习工作高级培训最佳资源分配器 (An Optimal Resource Allocator of Elastic Training for Deep Learning Jobs on Cloud) - 专知论文

会员服务 ·

0

优化器 · 学成 · 可约的 · 贪心 · 深度学习 ·

2021 年 9 月 8 日

An Optimal Resource Allocator of Elastic Training for Deep Learning Jobs on Cloud

翻译：云层深学习工作高级培训最佳资源分配器

Liang Hu,Jiangcheng Zhu,Zirui Zhou,Ruiqing Cheng,Xiaolong Bai,Yong Zhang

Cloud training platforms, such as Amazon Web Services and Huawei Cloud provide users with computational resources to train their deep learning jobs. Elastic training is a service embedded in cloud training platforms that dynamically scales up or down the resources allocated to a job. The core technique of an elastic training system is to best allocate limited resources among heterogeneous jobs in terms of shorter queueing delay and higher training efficiency. This paper presents an optimal resource allocator for elastic training system that leverages a mixed-integer programming (MIP) model to maximize the training progress of deep learning jobs. We take advantage of the real-world job data obtained from ModelArts, the deep learning training platform of Huawei Cloud and conduct simulation experiments to compare the optimal resource allocator with a greedy one as benchmark. Numerical results show that the proposed allocator can reduce queuing time by up to 32% and accelerate training efficiency by up to 24% relative to the greedy resource allocator, thereby greatly improving user experience with Huawei ModelArts and potentially enabling the realization of higher profits for the product. Also, the optimal resource allocator is fast in decision-making, taking merely 0.4 seconds on average.

翻译：亚马逊 Web Services 和 Huaweu Cloud 等云层培训平台为用户提供了计算资源,以培训深层学习工作。精英培训是云层培训平台中的一项服务,它能动态地扩大或缩小分配给一项工作的资源。弹性培训系统的核心技术是,在缩短排队延缓时间和提高培训效率方面,最佳地在多种工作之间分配有限资源。本文为弹性培训系统提供了一个最佳资源分配站,它利用混合内网程序(MIP)模式,最大限度地提高深层学习工作的培训进度。我们利用了从模拟艺术、Huaweweu Cloud深层学习培训平台获得的实实在在世界性工作数据,将最佳资源分配器与贪婪的资源分配器进行比较,将最佳资源分配器与贪婪的资源分配器相比较,将时间缩短到32%,并将培训效率提高至24%,从而大大改善Huawei Mod Arts 的用户经验,并有可能实现产品更高的利润。此外,最佳资源规划师在平均决策中速度为0.4秒。

0

相关内容

优化器

【图与几何深度学习】Graph and geometric deep learning，49页ppt

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【2020新书】现代数据仓库，297页pdf，The Modern Data Warehouse in Azure

【2020新书】现代数据仓库，297页pdf，The Modern Data Warehouse in Azure

专知会员服务

59+阅读 · 2020年6月17日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

德先生

53+阅读 · 2019年4月28日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

【今日新增】计算机领域国际会议截稿信息

【今日新增】计算机领域国际会议截稿信息

Call4Papers

9+阅读 · 2017年7月21日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

DeF-DReL: Systematic Deployment of Serverless Functions in Fog and Cloud environments using Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年10月29日

DFL: High-Performance Blockchain-Based Federated Learning

Arxiv

0+阅读 · 2021年10月28日

Energy Efficient Resource Allocation in Federated Fog Computing Networks

Energy Efficient Resource Allocation in Federated Fog Computing Networks

Arxiv

0+阅读 · 2021年10月28日

Computational Intelligence and Deep Learning for Next-Generation Edge-Enabled Industrial IoT

Computational Intelligence and Deep Learning for Next-Generation Edge-Enabled Industrial IoT

Arxiv

0+阅读 · 2021年10月28日

Regularized Online Allocation Problems: Fairness and Beyond

Regularized Online Allocation Problems: Fairness and Beyond

Arxiv

0+阅读 · 2021年10月27日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

On orthogonal projections for dimension reduction and applications in variational loss functions for learning problems

Arxiv

3+阅读 · 2019年1月22日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Transfer Learning with Neural AutoML

Arxiv

5+阅读 · 2018年9月11日

Large Scale Local Online Similarity/Distance Learning Framework based on Passive/Aggressive

Arxiv

5+阅读 · 2018年4月5日

VIP会员

文章信息

相关主题

相关VIP内容

【图与几何深度学习】Graph and geometric deep learning，49页ppt

【图与几何深度学习】Graph and geometric deep learning，49页ppt

专知会员服务

65+阅读 · 2021年4月24日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

【2020新书】现代数据仓库，297页pdf，The Modern Data Warehouse in Azure

【2020新书】现代数据仓库，297页pdf，The Modern Data Warehouse in Azure

专知会员服务

59+阅读 · 2020年6月17日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【强化学习资源集合】Awesome Reinforcement Learning

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

已删除

德先生

53+阅读 · 2019年4月28日

人工智能 | UAI 2019等国际会议信息4条

人工智能 | UAI 2019等国际会议信息4条

Call4Papers

6+阅读 · 2019年1月14日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

【今日新增】计算机领域国际会议截稿信息

【今日新增】计算机领域国际会议截稿信息

Call4Papers

9+阅读 · 2017年7月21日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

DeF-DReL: Systematic Deployment of Serverless Functions in Fog and Cloud environments using Deep Reinforcement Learning

Arxiv

0+阅读 · 2021年10月29日

DFL: High-Performance Blockchain-Based Federated Learning

Arxiv

0+阅读 · 2021年10月28日

Energy Efficient Resource Allocation in Federated Fog Computing Networks

Energy Efficient Resource Allocation in Federated Fog Computing Networks

Arxiv

0+阅读 · 2021年10月28日

Computational Intelligence and Deep Learning for Next-Generation Edge-Enabled Industrial IoT

Computational Intelligence and Deep Learning for Next-Generation Edge-Enabled Industrial IoT

Arxiv

0+阅读 · 2021年10月28日

Regularized Online Allocation Problems: Fairness and Beyond

Regularized Online Allocation Problems: Fairness and Beyond

Arxiv

0+阅读 · 2021年10月27日

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Arxiv

5+阅读 · 2020年4月2日

On orthogonal projections for dimension reduction and applications in variational loss functions for learning problems

Arxiv

3+阅读 · 2019年1月22日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Transfer Learning with Neural AutoML

Arxiv

5+阅读 · 2018年9月11日

Large Scale Local Online Similarity/Distance Learning Framework based on Passive/Aggressive

Arxiv

5+阅读 · 2018年4月5日

微信扫码咨询专知VIP会员