Cloud applications are increasingly shifting from large monolithic services, to large numbers of loosely-coupled, specialized microservices. Despite their advantages in terms of facilitating development, deployment, modularity, and isolation, microservices complicate resource management, as dependencies between them introduce backpressure effects and cascading QoS violations. We present Sinan, a data-driven cluster manager for interactive cloud microservices that is online and QoS-aware. Sinan leverages a set of scalable and validated machine learning models to determine the performance impact of dependencies between microservices, and allocate appropriate resources per tier in a way that preserves the end-to-end tail latency target. We evaluate Sinan both on dedicated local clusters and large-scale deployments on Google Compute Engine (GCE) across representative end-to-end applications built with microservices, such as social networks and hotel reservation sites. We show that Sinan always meets QoS, while also maintaining cluster utilization high, in contrast to prior work which leads to unpredictable performance or sacrifices resource efficiency. Furthermore, the techniques in Sinan are explainable, meaning that cloud operators can yield insights from the ML models on how to better deploy and design their applications to reduce unpredictable performance.
翻译:云层应用正日益从大型单一服务转向大量松散的、松散的、专业化的微服务。尽管微服务在促进发展、部署、模块化和隔离方面具有优势,但微服务使资源管理复杂化,因为它们之间的依赖性带来后压效应和连锁的QOS违规现象。我们介绍Sinan,一个数据驱动的集群管理者,用于在线和Qos-aware的互动式云层微服务;Sinan利用一套可缩放和经过验证的机器学习模型,以确定微观服务之间依赖性的业绩影响,并在各个层次分配适当的资源,以保持终端至终端尾部的耐久性目标。我们评估Sinan的本地专用集群和大规模部署在Google Comput Engle(GCE)有代表性的端对端应用中,如社交网络和酒店预订站。我们显示Sina总是满足Qos的群集利用率,同时保持高水平,与先前的工作形成不可预测的性能或牺牲资源效率。此外,Sinland操作者们可以更好地解释如何从云层应用到Man设计。