Modern computing workloads are often composed of parallelizable jobs. A parallelizable job can be completed more quickly when run on additional servers. However, each job can only use a limited number of servers, known as its parallelizability level, which is determined by the type of computation the job performs and how it is implemented. Workloads generally consist of multiple job classes, where jobs from different classes have different parallelizability levels and follow different job size (service requirement) distributions. This paper considers scheduling parallelizable jobs belonging to an arbitrary number of job classes. Given a limited number of servers, we must allocate servers across a stream of arriving jobs to minimize mean response time -- the average time from when a job arrives to the system until it completes. We find that in lighter-load scaling regimes (i.e., Sub-Halfin-Whitt), the optimal allocation policy is Least-Parallelizable-First (LPF), which prioritizes jobs from the least parallelizable job classes regardless of their size distributions. By contrast, we find that in the heavier-load regimes (i.e., Super-NDS), the optimal allocation policy prioritizes jobs with the Shortest Expected Remaining Processing Time (SERPT). We also develop policies that are asymptotically optimal when the scaling regime is not known a priori.
翻译:现代计算工作负载通常由可并行作业组成。可并行作业在更多服务器上运行时能够更快完成。然而,每个作业仅能使用有限数量的服务器,这一限制称为其可并行化水平,该水平由作业执行的计算类型及其实现方式决定。工作负载通常包含多个作业类别,不同类别的作业具有不同的可并行化水平,并遵循不同的作业规模(服务需求)分布。本文研究属于任意数量作业类别的可并行作业的调度问题。在给定有限服务器数量的条件下,我们需要在到达的作业流中分配服务器,以最小化平均响应时间——即从作业到达系统至完成所需的平均时间。研究发现,在轻负载扩展机制(即Sub-Halfin-Whitt区域)下,最优分配策略为最小可并行化优先(LPF),该策略优先处理可并行化水平最低的作业类别,而不考虑其规模分布。相反,在重负载机制(即Super-NDS区域)下,最优分配策略优先处理具有最短预期剩余处理时间(SERPT)的作业。本文还开发了在扩展机制未知时仍能保持渐近最优性的调度策略。