We develop a Markovian framework for load balancing where classical algorithms such as Power-of-$d$ are combined with auto-scaling mechanisms, which allow the net service capacity to scale up or down in response to the current load within the same timescale of job dynamics. Our framework is inspired by serverless platforms such as Knative where servers are software functions that can be flexibly instantiated in milliseconds according to scaling rules defined by the users of the serverless platform. The main question is how to design such scaling rules to minimize user-perceived delay performance while guaranteeing low energy consumption. For the first time, we investigate this problem when the auto-scaling and load balancing processes operate \emph{asynchronously}, as in Knative. One advantage induced by asynchronism is that jobs do not necessarily need to wait any time a scale-up decision is taken. In our main result, we find a general condition on the structure of scaling rules able to drive mean-field dynamics to delay and relative energy optimality, i.e., a situation where both the user-perceived delay and the relative energy wastage induced by idle servers vanish in the limit where the network demand grows to infinity in proportion to the nominal service capacity. The identified condition suggests to scale up the current net capacity if and only if the mean demand exceeds the rate at which servers become idle and active. Finally, we propose \emph{Rate-Idle}, i.e., a scaling rule that satisfies our optimality condition, and by means of numerical simulations, we show that it improves delay performance over existing (synchronous) schemes.
翻译:我们开发了一个马尔可夫框架,用于负载平衡,在该框架中,将经典算法(例如 Power-of-$d$ 算法)与自动缩放机制相结合,该机制允许净服务容量根据当前负载在作业动态的同一时间范围内向上或向下缩放。我们的框架受服务器无状态平台(例如 Knative)的启发,其中服务器是可以根据服务器无状态平台的用户定义的缩放规则在毫秒级别内灵活实例化的软件函数。主要问题是如何设计这样的缩放规则,以在保证低能耗的同时最小化用户感知的延迟性能。我们首次研究了当自动缩放和负载平衡过程异步运行时(如 Knative)会发生什么。异步性带来的一个好处是,在做出扩展决策时,作业不需要等待任何时间。在我们的主要结果中,我们找到了适用于驱动均值场动态到延迟和相对能源最优性的缩放规则结构的一般条件,即,在网络需求与名义服务容量按比例增长到无穷大的极限情况下,用户感知的延迟和因空闲服务器而产生的相对能量浪费均消失。所确定的条件表明,只有当平均需求超过服务器变为空闲和活动状态的速率时,才需要扩展当前的净容量。最后,我们提出了一个满足我们最优条件的缩放规则——“Rate-Idle”。通过数值模拟,我们表明它可以改善延迟性能优于现有(同步)方案。