We consider the stochastic gradient descent (SGD) algorithm driven by a general stochastic sequence, including i.i.d noise and random walk on an arbitrary graph, among others; and analyze it in the asymptotic sense. Specifically, we employ the notion of `efficiency ordering', a well-analyzed tool for comparing the performance of Markov Chain Monte Carlo (MCMC) samplers, for SGD algorithms in the form of Loewner ordering of covariance matrices associated with the scaled iterate errors in the long term. Using this ordering, we show that input sequences that are more efficient for MCMC sampling also lead to smaller covariance of the errors for SGD algorithms in the limit. This also suggests that an arbitrarily weighted MSE of SGD iterates in the limit becomes smaller when driven by more efficient chains. Our finding is of particular interest in applications such as decentralized optimization and swarm learning, where SGD is implemented in a random walk fashion on the underlying communication graph for cost issues and/or data privacy. We demonstrate how certain non-Markovian processes, for which typical mixing-time based non-asymptotic bounds are intractable, can outperform their Markovian counterparts in the sense of efficiency ordering for SGD. We show the utility of our method by applying it to gradient descent with shuffling and mini-batch gradient descent, reaffirming key results from existing literature under a unified framework. Empirically, we also observe efficiency ordering for variants of SGD such as accelerated SGD and Adam, open up the possibility of extending our notion of efficiency ordering to a broader family of stochastic optimization algorithms.
翻译:我们认为由一般随机序列驱动的随机梯度下降算法(SGD)是由一般随机序列驱动的,其中包括在任意的图表中进行i.d噪音和随机行走等;并分析它。具体地说,我们采用了“效率定购”的概念,这是一个分析周密的工具,用于比较Markov Cance Lance Monte Carlo(MCMC)采样器的性能;SGD算法,其形式为:Lewner订购与长期超常误差相关的常态矩阵。我们根据这项命令,我们表明,对MCMC取样更为有效的输入序列也导致SGD运算法错误的变异性较小。我们使用“效率定购”的概念,即“效率定购定”的任意加权MSEGE,当由更高效的链驱动时,在SGD(M)采样中,SGD以随机行方式在成本和/或数据隐私的基本通信图上执行SGDD。我们展示了某些非马可调的Smarkoveralal-ralalal roup Oliferal-liferal lifal 。我们的Slieval-lieval 的Slieving Slieving Slieving Slieval-s-s-s-s-s-s-rofaldaldaldaldaldald-s 一种固定的Slievaldaldaldaldaldald 。我们用Smaldald 。我们用Sm 一种稳定的硬算法,我们用Slick 。我们用Sk-s 。我们用S-s 的固定的固定的Slick-s-s-s-s-s-s-s-s-caldaldald-s-s-s-s-s-s-s-s-s-s-saldalbaldaldaldaldaldaldald-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-s-caldal-s-s-