预测和批评:通过强化学习加速对云计算进行端到端至端的预测控制 (Predict-and-Critic: Accelerated End-to-End Predictive Control for Cloud Computing through Reinforcement Learning)

Cloud computing holds the promise of reduced costs through economies of scale. To realize this promise, cloud computing vendors typically solve sequential resource allocation problems, where customer workloads are packed on shared hardware. Virtual machines (VM) form the foundation of modern cloud computing as they help logically abstract user compute from shared physical infrastructure. Traditionally, VM packing problems are solved by predicting demand, followed by a Model Predictive Control (MPC) optimization over a future horizon. We introduce an approximate formulation of an industrial VM packing problem as an MILP with soft-constraints parameterized by the predictions. Recently, predict-and-optimize (PnO) was proposed for end-to-end training of prediction models by back-propagating the cost of decisions through the optimization problem. But, PnO is unable to scale to the large prediction horizons prevalent in cloud computing. To tackle this issue, we propose the Predict-and-Critic (PnC) framework that outperforms PnO with just a two-step horizon by leveraging reinforcement learning. PnC jointly trains a prediction model and a terminal Q function that approximates cost-to-go over a long horizon, by back-propagating the cost of decisions through the optimization problem \emph{and from the future}. The terminal Q function allows us to solve a much smaller two-step horizon optimization problem than the multi-step horizon necessary in PnO. We evaluate PnO and the PnC framework on two datasets, three workloads, and with disturbances not modeled in the optimization problem. We find that PnC significantly improves decision quality over PnO, even when the optimization problem is not a perfect representation of reality. We also find that hardening the soft constraints of the MILP and back-propagating through the constraints improves decision quality for both PnO and PnC.

翻译：云层计算有通过规模经济降低成本的希望。为了实现这一承诺, 云层计算供应商通常会解决连续资源分配问题, 客户工作量由共享硬件承担。虚拟机器( VM) 是现代云计算的基础, 因为它们有助于逻辑抽象用户从共享物理基础设施中计算共享物理基础设施的费用。传统上, VM包装问题通过预测需求来解决, 之后是未来前景的模型预测控制( MPC) 优化。为了解决这个问题, 我们提出一个工业 VM 包装问题的大致配方, 因为它是一个由预测参数参数参数参数所设定的软节奏配置。最近, 预测和优化( PnO) 提议对预测模型进行端对端至端培训, 通过优化问题, 将决定的成本对端 OO 进行回映。但是, PnO 无法在云层计算中进行大范围的预测。为了解决这个问题, 我们提出预测和 C 模型( PnC) 框架比我们更软的 PnO 更精确的表达率, 利用加固的双步视野。 PnC 联合培训一个必要的预测模型和终端的预测模型 QQ值运行功能, 也无法通过成本快速的运行决定。