As cloud applications shift from monoliths to loosely coupled microservices, application developers must decide how many compute resources (e.g., VMs) to give to each microservice within an application. This decision affects both (1) the dollar cost to the application developer and (2) the end-to-end latency perceived by the application user. Today, individual microservices are autoscaled by adding VMs whenever utilization metrics (e.g. CPU, RAM) cross a configurable threshold. Utilization-based autoscaling is simple to understand at the level of an individual microservice. However, an application user's end-to-end latency consists of time spent on multiple microservices and each microservice might need a different threshold to achieve an overall end-to-end latency. Further, thresholds are application and workload dependent. We present COLA, an autoscaler for microservice-based applications. COLA learns autoscaling policies tailored to applications and their workloads and allocates VMs to an application's microservices to meet end-to-end latency targets while minimizing dollar cost. Using 5 open-source applications, we compared COLA to several utilization and machine learning based autoscalers. COLA meets a desired median or tail latency target on 31 of 39 workloads where it provides an average cost reduction of 30.9% over the next cheapest autoscaler. COLA is the most cost effective autoscaling policy for 27 of these 31 workloads. The cost savings from managing a cluster with COLA result in COLA paying for its training cost in a few days.
翻译:随着云层应用从单板转向松散的微服务,应用程序开发者必须决定在应用程序中向每个微服务提供多少计算资源(如VMs),这一决定既影响到应用程序开发者的美元成本,又影响到应用程序用户认为的端到端的悬浮。如今,单个微服务通过每当使用标准(如CPU、RAM)跨过一个可配置阈值时添加VMs而自动升级。基于利用的自动升级政策在个人微服务一级很容易理解。然而,应用程序用户的端到端的悬浮包含在多个微服务上花费的时间,而每个微服务可能需要不同的阈值,以实现总体端到端的悬浮。此外,阈值取决于应用和工作量。我们介绍基于微服务应用程序(如CPU、RAM)的自动标尺。根据应用程序及其工作量进行自动缩放政策,并将VMs分配给应用程序的微观服务,以达到在多个微缩缩缩放服务上花费的时间,在多个微缩缩缩缩缩放政策上花费了最大预期的LA值,同时以最低的自动递减成本。