Latency-critical services have been widely deployed in cloud environments. For cost-efficiency, multiple services are usually co-located on a server. Thus, run-time resource scheduling becomes the pivot for QoS control in these complicated co-location cases. However, the scheduling exploration space enlarges rapidly with the increasing server resources, making the schedulers hardly provide ideal solutions quickly. More importantly, we observe that there are "resource cliffs" in the scheduling exploration space. They affect the exploration efficiency and always lead to severe QoS fluctuations. Resource cliffs cannot be easily avoided in previous schedulers. To address these problems, we propose a novel ML-based intelligent scheduler - OSML. It learns the correlation between architectural hints (e.g., IPC, cache misses, memory footprint, etc.), scheduling solutions and the QoS demands based on a data set we collected from 11 widely deployed services running on off-the-shelf servers. OSML employs multiple ML models to work collaboratively to predict QoS variations, shepherd the scheduling, and recover from QoS violations in complicated co-location cases. OSML can intelligently avoid resource cliffs during scheduling and reach an optimal solution much faster than previous approaches for co-located LC services. Experimental results show that OSML supports higher loads and meets QoS targets with lower scheduling overheads and shorter convergence time than previous studies.
翻译:在云层环境中广泛部署了关键的延迟服务。 为了提高成本效率,多种服务通常在服务器上同时使用。 因此,运行时间资源列表在这些复杂的合用地点案件中成为QOS控制点。 然而,随着服务器资源的增加,勘探空间的时间安排迅速扩大,使调度员很难迅速提供理想的解决方案。 更重要的是,我们注意到,在时间安排的勘探空间中存在着“资源悬崖”,它们影响勘探效率,并总是导致QOS的大幅波动。在以前的调度员中,资源悬崖无法轻易避免。为了解决这些问题,我们提议了一个新的基于 ML的智能智能调度仪 — OSML。它学习了建筑提示(例如,IPC、缓存、记忆足迹等)、时间安排解决方案和基于我们从现有服务器上广泛部署的11种服务中收集的QOSS需求之间的相互关系。 OSML使用多种多ML模型来协同工作,以预测QOS的变异性,将时间悬浮悬浮悬浮悬浮悬浮在前。 为了解决这些问题,我们建议采用基于QOSS的违反情况恢复新的高超高时段时间列表, 从而在以往的实验室实验性列表中得出了一种最先进的结果。