Decentralized stochastic gradient method emerges as a promising solution for solving large-scale machine learning problems. This paper studies the decentralized Markov chain gradient descent (DMGD) algorithm - a variant of the decentralized stochastic gradient methods where the random samples are taken along the trajectory of a Markov chain. This setting is well-motivated when obtaining independent samples is costly or impossible, which excludes the use of the traditional stochastic gradient algorithms. Specifically, we consider the first- and zeroth-order versions of decentralized Markov chain gradient descent over a connected network, where each node only communicates with its neighbors about intermediate results. The nonergodic convergence and the ergodic convergence rate of the proposed algorithms have been rigorously established, and their critical dependences on the network topology and the mixing time of Markov chain have been highlighted. The numerical tests further validate the sample efficiency of our algorithm.
翻译:分散式梯度法是解决大规模机器学习问题的一个很有希望的解决办法。 本文研究了分散式的Markov链梯度梯度(DMGD)算法(DMGD)算法(DMGD)算法(DMGD)算法(DMGD)算法(DMGD)算法(DMGD)算法(DMGD)算法(这是分散式的随机样本沿Markov链轨迹采集的分散式梯度梯度法的变方),这种设置在获得独立样本时动机良好,成本高或不可能,这排除了传统随机梯度梯度算法的使用。 具体地说,我们认为分散式的Markov 梯度梯度梯度梯度下降在连接网络上的第一和零级版本, 每一个节点只与邻居就中间结果进行交流。 提议的算法的非垂直趋同率已经严格地确立, 并突出了他们对网络地形学的关键依赖性以及Markov 链的混合时间。 数字测试进一步证实了我们算法的抽样效率 。