雨果:一个高效学习选择补充数据插件工作的群集调度器 (Hugo: A Cluster Scheduler that Efficiently Learns to Select Complementary Data-Parallel Jobs) - 专知论文

会员服务 ·

0

簇 · 学成 · 可约的 · Processing（编程语言） · Spark ·

2021 年 2 月 14 日

Hugo: A Cluster Scheduler that Efficiently Learns to Select Complementary Data-Parallel Jobs

翻译：雨果:一个高效学习选择补充数据插件工作的群集调度器

Lauritz Thamsen,Ilya Verbitskiy,Sasho Nedelkoski,Vinh Thuy Tran,Vinicius Meyer,Miguel G. Xavier,Odej Kao,Cesar A. F. De Rose

Distributed data processing systems like MapReduce, Spark, and Flink are popular tools for analysis of large datasets with cluster resources. Yet, users often overprovision resources for their data processing jobs, while the resource usage of these jobs also typically fluctuates considerably. Therefore, multiple jobs usually get scheduled onto the same shared resources to increase the resource utilization and throughput of clusters. However, job runtimes and the utilization of shared resources can vary significantly depending on the specific combinations of co-located jobs. This paper presents Hugo, a cluster scheduler that continuously learns how efficiently jobs share resources, considering metrics for the resource utilization and interference among co-located jobs. The scheduler combines offline grouping of jobs with online reinforcement learning to provide a scheduling mechanism that efficiently generalizes from specific monitored job combinations yet also adapts to changes in workloads. Our evaluation of a prototype shows that the approach can reduce the runtimes of exemplary Spark jobs on a YARN cluster by up to 12.5%, while resource utilization is increased and waiting times can be bounded.

翻译：分布式数据处理系统,如MapReduce、Spark和Flink等,是分析大型数据集集集资源的流行工具。然而,用户往往为数据处理工作提供过多的资源,而这些工作的资源使用也通常波动很大。因此,多份工作通常被安排在同一共享资源上,以增加资源利用和集群的吞吐量。然而,工作运行时间和共享资源的利用可能因合用工作的具体组合而大不相同。本文介绍雨果,这是不断学习如何高效地共享资源、考虑资源利用指标和共同部署工作之间干扰的群集调度器。调度器将脱线工作与在线强化学习结合起来,以提供一个安排机制,有效地从特定监测的工作组合中归纳,同时适应工作量的变化。我们对原型的评估表明,该方法可以将YARN组模范的Spark工作运行时间降低到12.5%,同时资源利用增加,等待的时间可以被捆绑在一起。

0

相关内容

Python编程基础，121页ppt

Python编程基础，121页ppt

专知会员服务

49+阅读 · 2021年1月1日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【AAAI Tutorials 2019】联合学习：机器学习中的用户隐私，数据安全性和机密性（Federated Learning: User Privacy, Data Security and Confidentiality in Machine Learning）

【AAAI Tutorials 2019】联合学习：机器学习中的用户隐私，数据安全性和机密性（Federated Learning: User Privacy, Data Security and Confidentiality in Machine Learning）

专知会员服务

15+阅读 · 2019年11月18日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

FedGroup: Efficient Clustered Federated Learning via Decomposed Data-Driven Measure

Arxiv

0+阅读 · 2021年4月8日

One-bit Spectrum Sensing with the Eigenvalue Moment Ratio Approach

Arxiv

0+阅读 · 2021年4月8日

Empowering Prosumer Communities in Smart Grid with Wireless Communications and Federated Edge Learning

Empowering Prosumer Communities in Smart Grid with Wireless Communications and Federated Edge Learning

Arxiv

0+阅读 · 2021年4月7日

Optimal CPU Scheduling in Data Centers via a Finite-Time Distributed Quantized Coordination Mechanism

Optimal CPU Scheduling in Data Centers via a Finite-Time Distributed Quantized Coordination Mechanism

Arxiv

0+阅读 · 2021年4月7日

Flower: A Friendly Federated Learning Research Framework

Arxiv

0+阅读 · 2021年4月7日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Advances and Open Problems in Federated Learning

Advances and Open Problems in Federated Learning

Arxiv

18+阅读 · 2019年12月10日

One-Shot Federated Learning

One-Shot Federated Learning

Arxiv

9+阅读 · 2019年3月5日

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

Arxiv

6+阅读 · 2018年9月13日

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

Arxiv

4+阅读 · 2018年8月24日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

Python编程基础，121页ppt

Python编程基础，121页ppt

专知会员服务

49+阅读 · 2021年1月1日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

324+阅读 · 2020年11月26日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【AAAI Tutorials 2019】联合学习：机器学习中的用户隐私，数据安全性和机密性（Federated Learning: User Privacy, Data Security and Confidentiality in Machine Learning）

【AAAI Tutorials 2019】联合学习：机器学习中的用户隐私，数据安全性和机密性（Federated Learning: User Privacy, Data Security and Confidentiality in Machine Learning）

专知会员服务

15+阅读 · 2019年11月18日

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

【O'Reilly AI Conference 2019】部署大规模分布式数据（How to deploy large-scale distributed data analytics and machine learning on containers (sponsored by HPE))，HPE BlueData，Thomas Phelan

专知会员服务

19+阅读 · 2019年11月5日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

检索增强生成（RAG）技术，261页slides

美联参会指南-联合规划与执行概述及政策框架 | 32页

从DeepSeek-R1学到的三个核心经验

大规模视觉模型中的提示式适配：综述

相关资讯

分布式并行架构Ray介绍

分布式并行架构Ray介绍

CreateAMind

10+阅读 · 2019年8月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

meta learning 17年：MAML SNAIL

meta learning 17年：MAML SNAIL

CreateAMind

11+阅读 · 2019年1月2日

Ray RLlib: Scalable 降龙十八掌

Ray RLlib: Scalable 降龙十八掌

CreateAMind

9+阅读 · 2018年12月28日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【计算机类】期刊专刊/国际会议截稿信息6条

【计算机类】期刊专刊/国际会议截稿信息6条

Call4Papers

3+阅读 · 2017年10月13日

相关论文

FedGroup: Efficient Clustered Federated Learning via Decomposed Data-Driven Measure

Arxiv

0+阅读 · 2021年4月8日

One-bit Spectrum Sensing with the Eigenvalue Moment Ratio Approach

Arxiv

0+阅读 · 2021年4月8日

Empowering Prosumer Communities in Smart Grid with Wireless Communications and Federated Edge Learning

Empowering Prosumer Communities in Smart Grid with Wireless Communications and Federated Edge Learning

Arxiv

0+阅读 · 2021年4月7日

Optimal CPU Scheduling in Data Centers via a Finite-Time Distributed Quantized Coordination Mechanism

Optimal CPU Scheduling in Data Centers via a Finite-Time Distributed Quantized Coordination Mechanism

Arxiv

0+阅读 · 2021年4月7日

Flower: A Friendly Federated Learning Research Framework

Arxiv

0+阅读 · 2021年4月7日

A Survey on Distributed Machine Learning

Arxiv

45+阅读 · 2019年12月20日

Advances and Open Problems in Federated Learning

Advances and Open Problems in Federated Learning

Arxiv

18+阅读 · 2019年12月10日

One-Shot Federated Learning

One-Shot Federated Learning

Arxiv

9+阅读 · 2019年3月5日

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

MSc Dissertation: Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach

Arxiv

6+阅读 · 2018年9月13日

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

To Cluster, or Not to Cluster: An Analysis of Clusterability Methods

Arxiv

4+阅读 · 2018年8月24日

微信扫码咨询专知VIP会员