Oversubscription is a common practice for improving cloud resource utilization. It allows the cloud service provider to sell more resources than the physical limit, assuming not all users would fully utilize the resources simultaneously. However, how to design an oversubscription policy that improves utilization while satisfying the some safety constraints remains an open problem. Existing methods and industrial practices are over-conservative, ignoring the coordination of diverse resource usage patterns and probabilistic constraints. To address these two limitations, this paper formulates the oversubscription for cloud as a chance-constrained optimization problem and propose an effective Chance Constrained Multi-Agent Reinforcement Learning (C2MARL) method to solve this problem. Specifically, C2MARL reduces the number of constraints by considering their upper bounds and leverages a multi-agent reinforcement learning paradigm to learn a safe and optimal coordination policy. We evaluate our C2MARL on an internal cloud platform and public cloud datasets. Experiments show that our C2MARL outperforms existing methods in improving utilization ($20\%\sim 86\%$) under different levels of safety constraints.
翻译:超额订阅是改进云层资源利用的一个常见做法。 它允许云服务供应商出售比实际限度更多的资源,假设并非所有用户都会同时充分利用资源。 但是,如何设计超额订阅政策,既提高利用率,又满足某些安全限制,仍然是一个尚未解决的问题。 现有方法和工业做法过于保守,忽视了不同资源使用模式和概率限制的协调。 为解决这两个限制,本文件将云的超额订阅作为一种受机会限制的优化问题,并提出了一种有效的多生强化学习机会(C2MARL)解决该问题的有效方法。 具体而言,C2MARL考虑到其上限,利用多试剂强化学习模式学习安全和最佳协调政策,从而减少限制数量。 我们在内部云层平台和公共云层数据集上对我们的C2MARL进行了评估。 实验显示,我们的C2MARL在安全限制程度不同的情况下,在改进利用率方面超过了现有方法(20-sim 86 $)。