Parameter servers (PSs) facilitate the implementation of distributed training for large machine learning tasks. In this paper, we argue that existing PSs are inefficient for tasks that exhibit non-uniform parameter access; their performance may even fall behind that of single node baselines. We identify two major sources of such non-uniform access: skew and sampling. Existing PSs are ill-suited for managing skew because they uniformly apply the same parameter management technique to all parameters. They are inefficient for sampling because the PS is oblivious to the associated randomized accesses and cannot exploit locality. To overcome these performance limitations, we introduce NuPS, a novel PS architecture that (i) integrates multiple management techniques and employs a suitable technique for each parameter and (ii) supports sampling directly via suitable sampling primitives and sampling schemes that allow for a controlled quality--efficiency trade-off. In our experimental study, NuPS outperformed existing PSs by up to one order of magnitude and provided up to linear scalability across multiple machine learning tasks.
翻译:参数服务器(PS) 有利于执行大型机器学习任务的分布式培训。 在本文中, 我们争论说, 现有的 PS 对于显示非统一参数访问量的任务来说效率低; 其性能甚至可能低于单一节点基线; 我们确定了这种非统一访问量的两个主要来源: 斜线和取样。 现有的 PS 不适合管理斜线, 因为它们对所有参数都统一应用相同的参数管理技术。 它们对于取样来说效率不高, 因为 PS 忽略了相关的随机访问量, 无法利用地点。 为了克服这些性能限制, 我们引入了 NuPS, 这是一种新型的 PS 结构, (一) 整合多种管理技术, 并采用适合每个参数的合适技术, (二) 通过合适的取样原始技术和取样计划直接支持取样, 从而实现有控制的质效率交易。 在我们的实验研究中, NuPS 超越了现有的 PS, 其规模最高为一等, 并且提供了跨多个机器学习任务的线性伸缩性。