分散的Markov连锁链梯级后裔 (Decentralized Markov Chain Gradient Descent)

Decentralized stochastic gradient method emerges as a promising solution for solving large-scale machine learning problems. This paper studies the decentralized Markov chain gradient descent (DMGD) algorithm - a variant of the decentralized stochastic gradient methods where the random samples are taken along the trajectory of a Markov chain. This setting is well-motivated when obtaining independent samples is costly or impossible, which excludes the use of the traditional stochastic gradient algorithms. Specifically, we consider the first- and zeroth-order versions of decentralized Markov chain gradient descent over a connected network, where each node only communicates with its neighbors about intermediate results. The nonergodic convergence and the ergodic convergence rate of the proposed algorithms have been rigorously established, and their critical dependences on the network topology and the mixing time of Markov chain have been highlighted. The numerical tests further validate the sample efficiency of our algorithm.

翻译：分散式梯度法是解决大规模机器学习问题的一个很有希望的解决办法。本文研究了分散式的Markov链梯度梯度(DMGD)算法(DMGD)算法(DMGD)算法(DMGD)算法(DMGD)算法(DMGD)算法(DMGD)算法(DMGD)算法(这是分散式的随机样本沿Markov链轨迹采集的分散式梯度梯度法的变方),这种设置在获得独立样本时动机良好,成本高或不可能,这排除了传统随机梯度梯度算法的使用。具体地说,我们认为分散式的Markov 梯度梯度梯度梯度下降在连接网络上的第一和零级版本, 每一个节点只与邻居就中间结果进行交流。提议的算法的非垂直趋同率已经严格地确立, 并突出了他们对网络地形学的关键依赖性以及Markov 链的混合时间。数字测试进一步证实了我们算法的抽样效率。

相关内容

马尔可夫链

关注 289

马尔可夫链，因安德烈·马尔可夫（A.A.Markov，1856－1922）得名，是指数学中具有马尔可夫性质的离散事件随机过程。该过程中，在给定当前知识或信息的情况下，过去（即当前以前的历史状态）对于预测将来（即当前以后的未来状态）是无关的。在马尔可夫链的每一步，系统根据概率分布，可以从一个状态变到另一个状态，也可以保持当前状态。状态的改变叫做转移，与不同的状态改变相关的概率叫做转移概率。随机漫步就是马尔可夫链的例子。随机漫步中每一步的状态是在图形中的点，每一步可以移动到任何一个相邻的点，在这里移动到每一个点的概率都是相同的（无论之前漫步路径是如何的）。

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日