Modern data centers serve workloads which are capable of exploiting parallelism. When a job parallelizes across multiple servers it will complete more quickly, but jobs receive diminishing returns from being allocated additional servers. Because allocating multiple servers to a single job is inefficient, it is unclear how best to allocate a fixed number of servers between many parallelizable jobs. This paper provides the first optimal allocation policy for minimizing the mean slowdown of parallelizable jobs of known size when all jobs are present at time 0. Our policy provides a simple closed form formula for the optimal allocations at every moment in time. Minimizing mean slowdown usually requires favoring short jobs over long ones (as in the SRPT policy). However, because parallelizable jobs have sublinear speedup functions, system efficiency is also an issue. System efficiency is maximized by giving equal allocations to all jobs and thus competes with the goal of prioritizing small jobs. Our optimal policy, high-efficiency SRPT (heSRPT), balances these competing goals. heSRPT completes jobs according to their size order, but maintains overall system efficiency by allocating some servers to each job at every moment in time. Our results generalize to also provide the optimal allocation policy with respect to mean flow time. Finally, we consider the online case where jobs arrive to the system over time. While optimizing mean slowdown in the online setting is even more difficult, we find that heSRPT provides an excellent heuristic policy for the online setting. In fact, our simulations show that heSRPT significantly outperforms state-of-the-art allocation policies for parallelizable jobs.
翻译:现代数据中心服务于能够利用平行关系的工作量。 当一个工作在多个服务器上平行工作时, 它会更快地完成, 但工作会从分配到更多服务器后得到越来越少的回报。 因为将多个服务器分配到一个工作是效率低下的, 如何在多个平行工作之间分配固定数量的服务器, 并不清楚如何最佳地分配固定数量的服务器。 本文提供了第一个最佳的分配政策, 以便在所有工作出现时, 最大限度地减少已知规模的平行工作的平均减速。 我们的政策提供了一个简单的封闭形式公式, 用于在每一个时刻最佳分配。 尽可能减少平均减速通常需要短期工作( SRPT政策) 。 然而, 由于平行工作具有亚线性加速功能, 系统效率也是个问题。 通过对所有工作平等分配, 系统效率得到最大化, 从而与小工作的优先排序相竞争。 我们的最佳政策, 高效的SRPT(hePT), 平衡这些竞争的目标。 HeSRPT根据它们的大小顺序完成工作, 并且保持整个系统的效率, 通过在每一时刻将一些服务器分配给长期工作。 我们的平行分配结果, 我们的Slentalalalalalalalalal dalalal dal