"As many of us know from bitter experience, the policies provided in extant operating systems, which are claimed to work well and behave fairly 'on the average', often fail to do so in the special cases important to us" [Wulf et al. 1974]. Written in 1974, these words motivated moving policy decisions into user-space. Today, as warehouse-scale computers (WSCs) have become ubiquitous, it is time to move policy decisions away from individual servers altogether. Built-in policies are complex and often exhibit bad performance at scale. Meanwhile, the highly-controlled WSC setting presents opportunities to improve performance and predictability. We propose moving all policy decisions from the OS kernel to the cluster manager (CM), in a new paradigm we call Grape CM. In this design, the role of the kernel is reduced to monitoring, sending metrics to the CM, and executing policy decisions made by the CM. The CM uses metrics from all kernels across the WSC to make informed policy choices, sending commands back to each kernel in the cluster. We claim that Grape CM will improve performance, transparency, and simplicity. Our initial experiments show how the CM can identify the optimal set of huge pages for any workload or improve memcached latency by 15%.
翻译:"正如我们许多人从痛苦的经验中所知道的那样,现有操作系统中提供的'平均情况下运行良好且表现公平'的策略在我们重要的特殊情况下往往会失败" [Wulf et al. 1974]。这些话于1974年写下,促使将策略决策移动到用户空间。如今,随着仓库规模计算机(WSCs)变得无处不在,是时候将策略决策从单个服务器中移出。内置的策略是复杂的,经常在大规模情况下表现不佳。同时,高度控制的WSC环境提供了提高性能和可预测性的机会。我们建议将所有策略决策从操作系统内核移动到集群管理器(CM)中,在我们称之为Grape CM的新范例中。在这种设计中,内核的角色减少到监视,将指标发送到CM,并执行CM进行的策略决定。CM使用来自WSC中所有内核的指标来做出知情的策略选择,并向群集中的每个内核发送命令。我们声称Grape CM将提高性能,透明度和简单性。我们的初步实验显示了CM如何识别任何工作负载的最佳巨型页面集或将memcached延迟降低15%。