ANDREAS:人工智能跟踪系统 (ANDREAS: Artificial intelligence traiNing scheDuler foR accElerAted resource clusterS) - 专知论文

会员服务 ·

0

簇 · Performer · AIM · CASE · AI ·

2021 年 5 月 11 日

ANDREAS: Artificial intelligence traiNing scheDuler foR accElerAted resource clusterS

翻译：ANDREAS:人工智能跟踪系统

Federica Filippini,Danilo Ardagna,Marco Lattuada,Edoardo Amaldi,Michele Ciavotta,Maciek Riedl,Katarzyna Materka,Paweł Skrzypek,Fabrizio Magugliani,Marco Cicala

Artificial Intelligence (AI) and Deep Learning (DL) algorithms are currently applied to a wide range of products and solutions. DL training jobs are highly resource demanding and they experience great benefits when exploiting AI accelerators (e.g., GPUs). However, the effective management of GPU-powered clusters comes with great challenges. Among these, efficient scheduling and resource allocation solutions are crucial to maximize performance and minimize Data Centers operational costs. In this paper we propose ANDREAS, an advanced scheduling solution that tackles these problems jointly, aiming at optimizing DL training runtime workloads and their energy consumption in accelerated clusters. Experiments based on simulation demostrate that we can achieve a cost reduction between 30 and 62% on average with respect to first-principle methods while the validation on a real cluster shows a worst case deviation below 13% between actual and predicted costs, proving the effectiveness of ANDREAS solution in practical scenarios.

翻译：人工智能(AI)和深学习(DL)算法目前适用于广泛的产品和解决方案。DL培训工作需要大量资源,在利用AI加速器(例如GPUs)时,他们受益匪浅。然而,有效管理GPU动力集群带来了巨大的挑战,其中高效的时间安排和资源分配解决方案对于最大限度地提高业绩和最大限度地减少数据中心的业务费用至关重要。本文建议ANDREAS采用一个先进的时间安排解决方案,联合解决这些问题,目的是优化DL培训运行时的工作量及其在加速集群中的能源消耗。基于模拟实验,我们可以平均降低30至62%的成本,在第一原则方法方面,而对于实际集群的验证显示实际成本和预测成本之间的最差的情况差异在13%以下,从而证明ANDREAS解决方案在实际情景中的有效性。

0

相关内容

【清华大学】图神经网络交通流预测综述论文，19页pdf

【清华大学】图神经网络交通流预测综述论文，19页pdf

专知会员服务

43+阅读 · 2021年1月29日

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

89+阅读 · 2020年12月2日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【牛津大学】深度学习时间序列预测，Time Series Forecasting With Deep Learning: A Survey

【牛津大学】深度学习时间序列预测，Time Series Forecasting With Deep Learning: A Survey

专知会员服务

142+阅读 · 2020年4月30日

网络流量监测与分析大数据综述，A Survey on Big Data for Network Traffic Monitoring and Analysis

网络流量监测与分析大数据综述，A Survey on Big Data for Network Traffic Monitoring and Analysis

专知会员服务

65+阅读 · 2020年3月5日

【ECML-PKDD 2019】用于处理多维语义轨迹和预测未来语义位置的多通道卷积神经网络（Multi-Channel Convolutional Neural Networks for Handling Multi-Dimensional Semantic Trajectories and Predicting Future Semantic Locations）

【ECML-PKDD 2019】用于处理多维语义轨迹和预测未来语义位置的多通道卷积神经网络（Multi-Channel Convolutional Neural Networks for Handling Multi-Dimensional Semantic Trajectories and Predicting Future Semantic Locations）

专知会员服务

7+阅读 · 2019年12月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【IJCAI 2019】人工智能在交通中的应用（Artificial Intelligence in Transportation），滴滴AI实验室研究员王征博士，秦志伟博士

【IJCAI 2019】人工智能在交通中的应用（Artificial Intelligence in Transportation），滴滴AI实验室研究员王征博士，秦志伟博士

专知会员服务

63+阅读 · 2019年8月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

人工智能 | NIPS 2019等国际会议信息8条

人工智能 | NIPS 2019等国际会议信息8条

Call4Papers

7+阅读 · 2019年3月21日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

Efficient Trajectory Compression and Range Query Processing

Arxiv

0+阅读 · 2021年7月1日

Fused inverse-normal method for integrated differential expression analysis of RNA-seq data

Arxiv

0+阅读 · 2021年6月30日

CCID5: An implementation of the BBR Congestion Control algorithm for DCCP and its impact over multi-path scenarios

Arxiv

0+阅读 · 2021年6月30日

On the Utility of Gradient Compression in Distributed Training Systems

Arxiv

0+阅读 · 2021年6月29日

Network of Tensor Time Series

Arxiv

20+阅读 · 2021年2月28日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

EnKCF: Ensemble of Kernelized Correlation Filters for High-Speed Object Tracking

Arxiv

6+阅读 · 2018年1月20日

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Arxiv

10+阅读 · 2016年9月30日

VIP会员

文章信息

相关主题

相关VIP内容

【清华大学】图神经网络交通流预测综述论文，19页pdf

【清华大学】图神经网络交通流预测综述论文，19页pdf

专知会员服务

43+阅读 · 2021年1月29日

最新《联邦学习Federated Learning》报告，Federated Learning

最新《联邦学习Federated Learning》报告，Federated Learning

专知会员服务

89+阅读 · 2020年12月2日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

【牛津大学】深度学习时间序列预测，Time Series Forecasting With Deep Learning: A Survey

【牛津大学】深度学习时间序列预测，Time Series Forecasting With Deep Learning: A Survey

专知会员服务

142+阅读 · 2020年4月30日

网络流量监测与分析大数据综述，A Survey on Big Data for Network Traffic Monitoring and Analysis

网络流量监测与分析大数据综述，A Survey on Big Data for Network Traffic Monitoring and Analysis

专知会员服务

65+阅读 · 2020年3月5日

【ECML-PKDD 2019】用于处理多维语义轨迹和预测未来语义位置的多通道卷积神经网络（Multi-Channel Convolutional Neural Networks for Handling Multi-Dimensional Semantic Trajectories and Predicting Future Semantic Locations）

【ECML-PKDD 2019】用于处理多维语义轨迹和预测未来语义位置的多通道卷积神经网络（Multi-Channel Convolutional Neural Networks for Handling Multi-Dimensional Semantic Trajectories and Predicting Future Semantic Locations）

专知会员服务

7+阅读 · 2019年12月1日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【IJCAI 2019】人工智能在交通中的应用（Artificial Intelligence in Transportation），滴滴AI实验室研究员王征博士，秦志伟博士

【IJCAI 2019】人工智能在交通中的应用（Artificial Intelligence in Transportation），滴滴AI实验室研究员王征博士，秦志伟博士

专知会员服务

63+阅读 · 2019年8月10日

热门VIP内容

开通专知VIP会员享更多权益服务

发射器定位中的传感器路径规划研究 | 235页

战略无人机 | 2025最新80页

蜂窝通信是否是无人机与无人地面战车主宰战场的关键？

无人机对机动战的影响 | 2025最新文献

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

人工智能 | NIPS 2019等国际会议信息8条

人工智能 | NIPS 2019等国际会议信息8条

Call4Papers

7+阅读 · 2019年3月21日

人工智能 | SCI期刊专刊信息3条

人工智能 | SCI期刊专刊信息3条

Call4Papers

5+阅读 · 2019年1月10日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

人工智能 | 国际会议截稿信息9条

人工智能 | 国际会议截稿信息9条

Call4Papers

4+阅读 · 2018年3月13日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【今日新增】IEEE Trans.专刊截稿信息8条

【今日新增】IEEE Trans.专刊截稿信息8条

Call4Papers

7+阅读 · 2017年6月29日

相关论文

Efficient Trajectory Compression and Range Query Processing

Arxiv

0+阅读 · 2021年7月1日

Fused inverse-normal method for integrated differential expression analysis of RNA-seq data

Arxiv

0+阅读 · 2021年6月30日

CCID5: An implementation of the BBR Congestion Control algorithm for DCCP and its impact over multi-path scenarios

Arxiv

0+阅读 · 2021年6月30日

On the Utility of Gradient Compression in Distributed Training Systems

Arxiv

0+阅读 · 2021年6月29日

Network of Tensor Time Series

Arxiv

20+阅读 · 2021年2月28日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

Accelerated Methods for Deep Reinforcement Learning

Accelerated Methods for Deep Reinforcement Learning

Arxiv

6+阅读 · 2019年1月10日

Accelerated Randomized Coordinate Descent Algorithms for Stochastic Optimization and Online Learning

Arxiv

9+阅读 · 2018年7月16日

EnKCF: Ensemble of Kernelized Correlation Filters for High-Speed Object Tracking

Arxiv

6+阅读 · 2018年1月20日

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Arxiv

10+阅读 · 2016年9月30日

微信扫码咨询专知VIP会员