平行机器学习的动态后备工人 (Dynamic backup workers for parallel machine learning)

The most popular framework for distributed training of machine learning models is the (synchronous) parameter server (PS). This paradigm consists of $n$ workers, which iteratively compute updates of the model parameters, and a stateful PS, which waits and aggregates all updates to generate a new estimate of model parameters and sends it back to the workers for a new iteration. Transient computation slowdowns or transmission delays can intolerably lengthen the time of each iteration. An efficient way to mitigate this problem is to let the PS wait only for the fastest $n-b$ updates, before generating the new parameters. The slowest $b$ workers are called backup workers. The optimal number $b$ of backup workers depends on the cluster configuration and workload, but also (as we show in this paper) on the hyper-parameters of the learning algorithm and the current stage of the training. We propose DBW, an algorithm that dynamically decides the number of backup workers during the training process to maximize the convergence speed at each iteration. Our experiments show that DBW 1) removes the necessity to tune $b$ by preliminary time-consuming experiments, and 2) makes the training up to a factor $3$ faster than the optimal static configuration.

翻译：对机器学习模型进行分布式培训的最受欢迎的框架是(同步)参数服务器(PS)。这一范式包括以美元为单位的工人,他们反复计算模型参数的更新情况,以及一个有声的PS,他们等待和汇总所有更新情况,以产生模型参数的新估计,然后将模型参数反馈给工人进行新的迭代。中度计算速度减慢或传输延迟可以不耐烦地延长每次迭代的时间。缓解这一问题的一个有效办法是让PS在产生新参数之前只等待最快的一美元-一美元更新。最慢的美元工人被称为后备工人。最慢的美元工人是称为后备工人。最佳的美元支持工人数目取决于组群配置和工作量,但也取决于学习算法和当前培训阶段的超参数(如本文所示)。我们建议DBW,一种动态地决定培训过程中后备工人人数的算法,以最大限度地提高每次迭代的趋速度。我们的实验表明DBW1号将必要从3美元调高的固定系数提高到最迅速的试验。

相关内容

Machine Learning

关注 2239

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

【经典书】机器学习白话书，97页pdf，Machine Learning for Humans

专知会员服务

87+阅读 · 2021年1月11日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

专知会员服务

117+阅读 · 2020年3月25日