In an overloaded FaaS cluster, individual worker nodes strain under lengthening queues of requests. Although the cluster might be eventually horizontally-scaled, adding a new node takes dozens of seconds. As serving applications are tuned for tail serving latencies, and these greatly increase under heavier loads, the current workaround is resource over-provisioning. In fact, even though a service can withstand a steady load of, e.g., 70% CPU utilization, the autoscaler is triggered at, e.g., 30-40% (thus the service uses twice as many nodes as it would be needed). We propose an alternative: a worker-level method handling heavy load without increasing the number of nodes. FaaS executions are not interactive, compared to, e.g., text editors: end-users do not benefit from the CPU allocated to processes often, yet for short periods. Inspired by scheduling methods for High Performance Computing, we take a radical step of replacing the classic OS preemption by (1) queuing requests based on their historical characteristics; (2) once a request is being processed, setting its CPU limit to exactly one core (with no CPU oversubscription). We extend OpenWhisk and measure the efficiency of the proposed solutions using the SeBS benchmark. In a loaded system, our method decreases the average response time by a factor of 4. The improvement is even higher for shorter requests, as the average stretch is decreased by a factor of 18. This leads us to show that we can provide better response-time statistics with 3 machines compared to a 4-machine baseline.
翻译:在一个超负荷的 FaaS 集群中, 单个工人节点在请求排长的队列中紧张。 虽然该组最终可能是横向的, 增加一个新的节点需要数十秒时间。 由于服务应用程序被调整以适应尾端服务延缓时间, 且这些在更重的负荷下大大增加, 目前的工作周期是资源供给过度。 事实上, 尽管服务可以承受稳定负荷, 例如70% CPU的利用率, 自动标尺的触发速度是30- 40%( 服务所使用的节点比需要的多一倍 ) 。 我们建议了一个替代方案: 工人级处理重负的方法, 而不增加节点的数量。 与文本编辑相比, faaS 执行不具有互动性: 终端用户无法从分配给进程的CPU( ), 而在短期内, 受高性能计算仪表的时间安排方法的启发, 我们采取一个彻底步骤, 取代经典的OS 预感应力,, 以(1 ) 以其历史特征为基础, 使用两倍的节点 ; ( ) 一旦正在处理中, 将 其递增速度, 将 CPU 的 要求 限制 以 以 以 递增 的 递增 标准 以 我们 的 的 的 的 递增 的 以 的 的 的 以 递增 以 的 的 以 的 的 向 递增 的 的 的 的 的 递增 递增 递增 。