Distributed optimization has become one of the standard ways of speeding up machine learning training, and most of the research in the area focuses on distributed first-order, gradient-based methods. Yet, there are settings where some computationally-bounded nodes may not be able to implement first-order, gradient-based optimization, while they could still contribute to joint optimization tasks. In this paper, we initiate the study of hybrid decentralized optimization, studying settings where nodes with zeroth-order and first-order optimization capabilities co-exist in a distributed system, and attempt to jointly solve an optimization task over some data distribution. We essentially show that, under reasonable parameter settings, such a system can not only withstand noisier zeroth-order agents but can even benefit from integrating such agents into the optimization process, rather than ignoring their information. At the core of our approach is a new analysis of distributed optimization with noisy and possibly-biased gradient estimators, which may be of independent interest. Experimental results on standard optimization tasks confirm our analysis, showing that hybrid first-zeroth order optimization can be practical.
翻译:分散式优化已成为加速机器学习培训的标准方法之一,而且该领域的大部分研究侧重于分布式第一阶、梯度法方法。然而,有些计算式节点可能无法实施第一阶、梯度法优化,尽管它们仍然可以促进联合优化任务。在本文件中,我们开始研究混合分散式优化,研究在分布式系统中存在零阶和第一阶优化能力的节点环境,并试图联合解决某些数据分配的优化任务。我们基本上表明,在合理的参数设置下,这种系统不仅能够承受 noiser零阶法代理器,而且能够从将此类代理器纳入优化进程中受益,而不是忽视它们的信息。我们的方法核心是对分散式优化进行新的分析,其中可能具有独立兴趣的噪音和可能带有偏差的梯度估计器进行分散式优化。标准优化任务的实验结果证实了我们的分析,显示混合第一阶点优化是可行的。