Kubernetes (k8s) has the potential to merge the distributed edge and the cloud but lacks a scheduling framework specifically for edge-cloud systems. Besides, the hierarchical distribution of heterogeneous resources and the complex dependencies among requests and resources make the modeling and scheduling of k8s-oriented edge-cloud systems particularly sophisticated. In this paper, we introduce KaiS, a learning-based scheduling framework for such edge-cloud systems to improve the long-term throughput rate of request processing. First, we design a coordinated multi-agent actor-critic algorithm to cater to decentralized request dispatch and dynamic dispatch spaces within the edge cluster. Second, for diverse system scales and structures, we use graph neural networks to embed system state information, and combine the embedding results with multiple policy networks to reduce the orchestration dimensionality by stepwise scheduling. Finally, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration, and present the implementation design of deploying the above algorithms compatible with native k8s components. Experiments using real workload traces show that KaiS can successfully learn appropriate scheduling policies, irrespective of request arrival patterns and system scales. Moreover, KaiS can enhance the average system throughput rate by 14.3% while reducing scheduling cost by 34.7% compared to baselines.
翻译:Kubernetes (k8s) 具有合并分布式边缘和云层的潜力,但缺乏专门用于边缘云层系统的列表框架。此外,由于不同资源的等级分布以及请求和资源之间的复杂依赖性,使K8s导向的边缘云层系统的建模和时间安排特别复杂。在本文件中,我们为这种边缘云层系统引入了一个基于学习的列表框架KaiS,以改进请求处理的长期吞吐率。首先,我们设计了一个协调的多试剂行为体-捷克算法,以满足边缘群内的分散式请求发送和动态发送空间。第二,对于不同的系统规模和结构,我们使用图形神经网络嵌入系统信息,并将嵌入的结果与多个政策网络相结合,通过渐进式的时间安排来降低交响度。最后,我们采用了一个双时间级的列表机制,以协调请求发送和服务调控件,并介绍与本地 k8s 组件兼容的上述算法的实施设计。 使用实际工作量追踪表明,KaiS能够成功地学习适当的列表政策,而不管申请抵达模式和系统基准比例如何降低14 % 。KAS 将提高平均成本。