使用约束控制 RL 控制控制在数据中心中任务调度 (Job Scheduling in Datacenters using Constraint Controlled RL)

This paper studies a model for online job scheduling in green datacenters. In green datacenters, resource availability depends on the power supply from the renewables. Intermittent power supply from renewables leads to intermittent resource availability, inducing job delays (and associated costs). Green datacenter operators must intelligently manage their workloads and available power supply to extract maximum benefits. The scheduler's objective is to schedule jobs on a set of resources to maximize the total value (revenue) while minimizing the overall job delay. A trade-off exists between achieving high job value on the one hand and low expected delays on the other. Hence, the aims of achieving high rewards and low costs are in opposition. In addition, datacenter operators often prioritize multiple objectives, including high system utilization and job completion. To accomplish the opposing goals of maximizing total job value and minimizing job delays, we apply the Proportional-Integral-Derivative (PID) Lagrangian methods in Deep Reinforcement Learning to job scheduling problem in the green datacenter environment. Lagrangian methods are widely used algorithms for constrained optimization problems. We adopt a controls perspective to learn the Lagrange multiplier with proportional, integral, and derivative control, achieving favorable learning dynamics. Feedback control defines cost terms for the learning agent, monitors the cost limits during training, and continuously adjusts the learning parameters to achieve stable performance. Our experiments demonstrate improved performance compared to scheduling policies without the PID Lagrangian methods. Experimental results illustrate the effectiveness of the Constraint Controlled Reinforcement Learning (CoCoRL) scheduler that simultaneously satisfies multiple objectives.

翻译：本文研究绿色数据中心在线工作时间安排模式。在绿色数据中心中, 资源供给取决于可再生能源的电力供应。可再生能源的中断供电导致资源供应的间歇性, 导致工作延误( 及相关成本 ) 。绿色数据中心操作员必须明智地管理工作量和现有电力供应, 以获取最大效益。调度员的目标是在一组资源上安排工作, 最大限度地增加总价值( 收入), 并尽量减少整个工作延迟。在绿色数据中心中, 资源供应取决于可再生能源的电力供应量。因此, 实现高回报和低成本的目标相悖。此外, 数据中心操作员往往优先考虑多项目标, 包括高系统利用率和完成工作。为了实现实现使全部工作价值最大化和最大限度地减少工作延误的相反目标, 我们采用比例化- Intracal- Derivicive (PID) Lagerangian 方法, 最大限度地增加总价值,同时尽量减少整个工作延误。深度强化学习学习中的工作时间安排问题。 Lagerangian 方法被广泛使用, 比较优化优化后, 对比优化政策, 我们采用比例性控制成本控制观点, 学习成本限制。。学习系统学习学习成本。