HPC中排排排的深层加固剂 (Deep Reinforcement Agent for Scheduling in HPC)

Cluster scheduler is crucial in high-performance computing (HPC). It determines when and which user jobs should be allocated to available system resources. Existing cluster scheduling heuristics are developed by human experts based on their experience with specific HPC systems and workloads. However, the increasing complexity of computing systems and the highly dynamic nature of application workloads have placed tremendous burden on manually designed and tuned scheduling heuristics. More aggressive optimization and automation are needed for cluster scheduling in HPC. In this work, we present an automated HPC scheduling agent named DRAS (Deep Reinforcement Agent for Scheduling) by leveraging deep reinforcement learning. DRAS is built on a novel, hierarchical neural network incorporating special HPC scheduling features such as resource reservation and backfilling. A unique training strategy is presented to enable DRAS to rapidly learn the target environment. Once being provided a specific scheduling objective given by system manager, DRAS automatically learns to improve its policy through interaction with the scheduling environment and dynamically adjusts its policy as workload changes. The experiments with different production workloads demonstrate that DRAS outperforms the existing heuristic and optimization approaches by up to 45%.

翻译：在高性能计算(HPC)中,集束调度系统至关重要。它决定了何时和哪些用户职位应分配给可用的系统资源。现有的集束调度表由人类专家根据他们在特定高常PC系统和工作量方面的经验制定;然而,计算机系统日益复杂,应用工作量的高度动态性给手工设计和调整的排程工作带来了巨大的负担。在高业绩计算(HPC)中,集群调度需要更积极的优化和自动化。在这项工作中,我们通过利用深度加固学习,展示了名为DRAS(高级排备剂)的自动高频PC排程代理。DRAS(高级排备剂)建在一个新型的、等级的神经网络上,其中包含了特别的HPC排程特点,例如资源保留和回填。介绍了一项独特的培训战略,使DRAS能够迅速了解目标环境。一旦系统管理员提供了具体的排程目标,DRAS就自动学习如何通过与排期环境互动来改进其政策,并随着工作量的变化动态调整其政策。不同生产工作量的实验表明DRAS(DRAS)比现有的超模和优化方法达到45%。

相关内容

Automator

关注 0

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【DeepMind深度学习课程】序列循环神经网络，141页ppt，Sequences and Recurrent Network

专知会员服务

86+阅读 · 2020年6月23日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

【AAAI2020教程】强化学习中的Exploration-Exploitation in Reinforcement Learning

专知会员服务

101+阅读 · 2020年2月8日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日