Resource provisioning plays a pivotal role in determining the right amount of infrastructure resource to run applications and target the global decarbonization goal. A significant portion of production clusters is now dedicated to long-running applications (LRAs), which are typically in the form of microservices and executed in the order of hours or even months. It is therefore practically important to plan ahead the placement of LRAs in a shared cluster so that the number of compute nodes required by them can be minimized to reduce carbon footprint and lower operational costs. Existing works on LRA scheduling are often application-agnostic, without particularly addressing the constraining requirements imposed by LRAs, such as co-location affinity constraints and time-varying resource requirements. In this paper, we present an affinity-aware resource provisioning approach for deploying large-scale LRAs in a shared cluster subject to multiple constraints, with the objective of minimizing the number of compute nodes in use. We investigate a broad range of solution algorithms which fall into three main categories: Application-Centric, Node-Centric, and Multi-Node approaches, and tune them for typical large-scale real-world scenarios. Experimental studies driven by the Alibaba Tianchi dataset show that our algorithms can achieve competitive scheduling effectiveness and running time, as compared with the heuristics used by the latest work including Medea and LraSched.
翻译:资源提供在确定基础设施资源的适当数量以运行应用程序和瞄准全球去碳化目标方面发挥着关键作用。现在,很大一部分生产集群现在专门用于长期应用(LARC),这些应用通常以微服务形式出现,按小时或甚至数月的顺序执行,因此,实际上重要的是预先计划将上帝军置于一个共同的集群中,以便尽可能减少它们所需的计算节点数量,以减少碳足迹和较低的业务费用。上帝军列表的现有工作往往是应用-不可知性的,没有特别解决上帝军施加的限制性要求,例如合用近距离限制和时间变化的资源要求。在本文件中,我们介绍了在共同的集群中部署大规模上帝军的亲近性-认知性资源提供办法,目的是尽量减少它们所需的计算节点数量,以减少碳足迹和较低的业务费用。我们调查了范围广泛的解决方案算法,这分为三大类:应用中心、诺德-Centric和多点方法,没有特别解决上帝军所施加的限制性要求,例如合用合用近距离限制和时间变化的资源要求。我们提出了一种近距离-认知性的资源供给方法,用于在一个共同的集群内部署大型的大型上帝军,包括由他所驱动的、以显示的最新性天平-智能算算法,我们所利用的、可实现的时空算式的时空算法,以显示的时空算。