Parameter servers (PSs) ease the implementation of distributed training for large machine learning (ML) tasks by providing primitives for shared parameter access. Especially for ML tasks that access parameters sparsely, PSs can achieve high efficiency and scalability. To do so, they employ a number of techniques -- such as replication or relocation -- to reduce communication cost and/or latency of parameter accesses. A suitable choice and parameterization of these techniques is crucial to realize these gains, however. Unfortunately, such choices depend on the task, the workload, and even individual parameters, they often require expensive upfront experimentation, and they are susceptible to workload changes. In this paper, we explore whether PSs can automatically adapt to the workload without any prior tuning. Our goals are to improve usability and to maintain (or even improve) efficiency. We propose (i) a novel intent signaling mechanism that acts as an enabler for adaptivity and naturally integrates into ML tasks, and (ii) a fully adaptive, zero-tuning PS called AdaPS based on this mechanism. Our experimental evaluation suggests that automatic adaptation to the workload is indeed possible: AdaPS matched or outperformed state-of-the-art PSs out of the box.
翻译:参数服务器(PS)通过提供共享参数访问的原始数据,便利实施大型机器学习任务(ML)的分散培训。特别是对于访问参数很少的 ML 任务,PS能够实现高效和可扩缩性。为此,他们采用一些技术,例如复制或迁移,以减少通信成本和(或)参数接入的延迟度。但这些技术的适当选择和参数化对于实现这些成果至关重要。不幸的是,这些选择取决于任务、工作量、甚至单个参数,往往需要昂贵的前期试验,而且它们容易受到工作量变化的影响。在本文件中,我们探讨PS能否自动适应工作量,而不事先进行任何调整。我们的目标是提高可用性,并保持(或甚至提高)效率。我们提议:(一) 一个新的意向信号机制,作为适应能力和自然融入 ML 任务的能力,以及(二) 基于此机制的完全适应性、零调PS 。我们的实验评估表明,自动适应工作量是可能的:AdaPS 匹配或超状态。