RAPID: 在动态公共云环境中实现快速在线策略学习 (RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud Environments) - 专知论文

会员服务 ·

0

QoS · 负载 · 资源分配策略 · 云环境 · 在线 ·

2023 年 4 月 10 日

RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud Environments

翻译：RAPID: 在动态公共云环境中实现快速在线策略学习

Drew Penney,Bin Li,Lizhong Chen,Jaroslaw J. Sydir,Anna Drewek-Ossowicka,Ramesh Illikkal,Charlie Tai,Ravi Iyer,Andrew Herdrich

Resource sharing between multiple workloads has become a prominent practice among cloud service providers, motivated by demand for improved resource utilization and reduced cost of ownership. Effective resource sharing, however, remains an open challenge due to the adverse effects that resource contention can have on high-priority, user-facing workloads with strict Quality of Service (QoS) requirements. Although recent approaches have demonstrated promising results, those works remain largely impractical in public cloud environments since workloads are not known in advance and may only run for a brief period, thus prohibiting offline learning and significantly hindering online learning. In this paper, we propose RAPID, a novel framework for fast, fully-online resource allocation policy learning in highly dynamic operating environments. RAPID leverages lightweight QoS predictions, enabled by domain-knowledge-inspired techniques for sample efficiency and bias reduction, to decouple control from conventional feedback sources and guide policy learning at a rate orders of magnitude faster than prior work. Evaluation on a real-world server platform with representative cloud workloads confirms that RAPID can learn stable resource allocation policies in minutes, as compared with hours in prior state-of-the-art, while improving QoS by 9.0x and increasing best-effort workload performance by 19-43%.

翻译：多个工作负载之间的资源共享已成为云服务提供商之间的重要实践，这是为了改善资源利用率并降低所有权成本。然而，有效的资源共享仍然是一个开放性挑战，因为资源争用可能对具有严格服务质量（QoS）要求的高优先级用户界面工作负载产生不利影响。虽然最近的方法取得了有前途的结果，但由于工作负载未知并且可能仅运行短时间，因此离线学习受到限制，极大地阻碍了在线学习。在本文中，我们提出了RAPID，这是一种快速、完全在线的资源分配策略学习框架，在高度动态的操作环境中使用。RAPID利用轻量级QoS预测，通过启用领域知识启发的技术以提高样本效率和减少偏差，将控制从传统反馈源中解耦，以远高于之前工作的速率指导策略学习。使用代表性云工作负载进行的在真实世界服务器平台上的评估确认，RAPID可以在几分钟内学习稳定的资源分配策略，而之前最先进的方法需要数小时，同时将QoS提高了9.0倍，并将最好尽力工作负载的性能提高了19-43%。

0

相关内容

QoS

【SIGMOD教程】高效数据标签的众包实践:聚合、增量重标签和定价，附180页slides

【SIGMOD教程】高效数据标签的众包实践:聚合、增量重标签和定价，附180页slides

专知会员服务

11+阅读 · 2022年10月20日

【WWW2021-Tutorial】高级深度图学习:更深入、更快、更鲁棒和无监督

【WWW2021-Tutorial】高级深度图学习:更深入、更快、更鲁棒和无监督

专知会员服务

40+阅读 · 2021年4月18日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知会员服务

87+阅读 · 2020年8月28日

【CMU博士论文】使用静态和动态图来异常检测，Mining Anomalies using Static and Dynamic Graphs

【CMU博士论文】使用静态和动态图来异常检测，Mining Anomalies using Static and Dynamic Graphs

专知会员服务

68+阅读 · 2020年5月26日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

泡泡机器人SLAM

25+阅读 · 2019年1月17日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

面向复杂RFID数据采集任务的分布式协同方法研究

国家自然科学基金

6+阅读 · 2015年12月31日

云计算环境中面向时间约束的大规模并行业务流程的监控策略研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于监控与反馈的软件重构机会检测方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

大规模RFID系统标签的自适应高效准确识别策略研究

国家自然科学基金

0+阅读 · 2014年12月31日

情境感知云计算工作流的动态服务选择研究

国家自然科学基金

0+阅读 · 2013年12月31日

混合云中的数据密集型工作流调度策略研究

国家自然科学基金

1+阅读 · 2013年12月31日

云环境中容错软件结构模型设计方法

国家自然科学基金

1+阅读 · 2013年12月31日

云环境下基于历史调用记录的可信服务选择研究

国家自然科学基金

0+阅读 · 2013年12月31日

Petri网模型驱动的SaaS型云测试方法及支撑平台研究

国家自然科学基金

0+阅读 · 2012年12月31日

动态云环境中基于SLA的工作流调度

国家自然科学基金

0+阅读 · 2012年12月31日

Empowering Practical Root Cause Analysis by Large Language Models for Cloud Incidents

Arxiv

0+阅读 · 2023年5月29日

Learning to Program with Natural Language

Arxiv

0+阅读 · 2023年5月29日

Chronosymbolic Learning: Efficient CHC Solving with Symbolic Reasoning and Inductive Learning

Arxiv

0+阅读 · 2023年5月27日

Edge to Cloud Tools: A Multivocal Literature Review

Arxiv

0+阅读 · 2023年5月27日

Multi-Stage Monte Carlo Tree Search for Non-Monotone Object Rearrangement Planning in Narrow Confined Environments

Arxiv

0+阅读 · 2023年5月26日

Path Defense in Dynamic Defender-Attacker Blotto Games (dDAB) with Limited Information

Arxiv

0+阅读 · 2023年5月25日

Online and Streaming Algorithms for Constrained $k$-Submodular Maximization

Arxiv

0+阅读 · 2023年5月25日

Mixture-of-Expert Conformer for Streaming Multilingual ASR

Arxiv

0+阅读 · 2023年5月25日

Vehicle-in-Virtual-Environment (VVE)

Arxiv

0+阅读 · 2023年5月25日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

VIP会员

文章信息

相关主题

资源分配策略

相关VIP内容

【SIGMOD教程】高效数据标签的众包实践:聚合、增量重标签和定价，附180页slides

【SIGMOD教程】高效数据标签的众包实践:聚合、增量重标签和定价，附180页slides

专知会员服务

11+阅读 · 2022年10月20日

【WWW2021-Tutorial】高级深度图学习:更深入、更快、更鲁棒和无监督

【WWW2021-Tutorial】高级深度图学习:更深入、更快、更鲁棒和无监督

专知会员服务

40+阅读 · 2021年4月18日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

45+阅读 · 2020年12月18日

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

【KDD2020-Tutorial】因果推理与稳定学习，Causal Inference and Stable Learning

专知会员服务

87+阅读 · 2020年8月28日

【CMU博士论文】使用静态和动态图来异常检测，Mining Anomalies using Static and Dynamic Graphs

【CMU博士论文】使用静态和动态图来异常检测，Mining Anomalies using Static and Dynamic Graphs

专知会员服务

68+阅读 · 2020年5月26日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

用于无人机的C波段空地通信系统研究 | 2025最新116页

甚高频军事战术通信系统传播性能分析研究

军事通信系统：安全行动的支柱

卫星与地面通信系统：美陆军面临的空间与电子战局势 | 39页报告

相关资讯

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

【NeurIPS 2020 Tutorial】离线强化学习:从算法到挑战，80页ppt

专知

16+阅读 · 2020年12月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

泡泡机器人SLAM

25+阅读 · 2019年1月17日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Empowering Practical Root Cause Analysis by Large Language Models for Cloud Incidents

Arxiv

0+阅读 · 2023年5月29日

Learning to Program with Natural Language

Arxiv

0+阅读 · 2023年5月29日

Chronosymbolic Learning: Efficient CHC Solving with Symbolic Reasoning and Inductive Learning

Arxiv

0+阅读 · 2023年5月27日

Edge to Cloud Tools: A Multivocal Literature Review

Arxiv

0+阅读 · 2023年5月27日

Multi-Stage Monte Carlo Tree Search for Non-Monotone Object Rearrangement Planning in Narrow Confined Environments

Arxiv

0+阅读 · 2023年5月26日

Path Defense in Dynamic Defender-Attacker Blotto Games (dDAB) with Limited Information

Arxiv

0+阅读 · 2023年5月25日

Online and Streaming Algorithms for Constrained $k$-Submodular Maximization

Arxiv

0+阅读 · 2023年5月25日

Mixture-of-Expert Conformer for Streaming Multilingual ASR

Arxiv

0+阅读 · 2023年5月25日

Vehicle-in-Virtual-Environment (VVE)

Arxiv

0+阅读 · 2023年5月25日

Multiagent Soft Q-Learning

Arxiv

11+阅读 · 2018年4月25日

相关基金

面向复杂RFID数据采集任务的分布式协同方法研究

国家自然科学基金

6+阅读 · 2015年12月31日

云计算环境中面向时间约束的大规模并行业务流程的监控策略研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于监控与反馈的软件重构机会检测方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

大规模RFID系统标签的自适应高效准确识别策略研究

国家自然科学基金

0+阅读 · 2014年12月31日

情境感知云计算工作流的动态服务选择研究

国家自然科学基金

0+阅读 · 2013年12月31日

混合云中的数据密集型工作流调度策略研究

国家自然科学基金

1+阅读 · 2013年12月31日

云环境中容错软件结构模型设计方法

国家自然科学基金

1+阅读 · 2013年12月31日

云环境下基于历史调用记录的可信服务选择研究

国家自然科学基金

0+阅读 · 2013年12月31日

Petri网模型驱动的SaaS型云测试方法及支撑平台研究

国家自然科学基金

0+阅读 · 2012年12月31日

动态云环境中基于SLA的工作流调度

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员