Stochastic gradient algorithms are widely used for both optimization and sampling in large-scale learning and inference problems. However, in practice, tuning these algorithms is typically done using heuristics and trial-and-error rather than rigorous, generalizable theory. To address this gap between theory and practice, we novel insights into the effect of tuning parameters by characterizing the large-sample behavior of iterates of a very general class of preconditioned stochastic gradient algorithms with fixed step size. In the optimization setting, our results show that iterate averaging with a large fixed step size can result in statistically efficient approximation of the (local) M-estimator. In the sampling context, our results show that with appropriate choices of tuning parameters, the limiting stationary covariance can match either the Bernstein--von Mises limit of the posterior, adjustments to the posterior for model misspecification, or the asymptotic distribution of the MLE; and that with a naive tuning the limit corresponds to none of these. Moreover, we argue that an essentially independent sample from the stationary distribution can be obtained after a fixed number of passes over the dataset. We validate our asymptotic results in realistic finite-sample regimes via several experiments using simulated and real data. Overall, we demonstrate that properly tuned stochastic gradient algorithms with constant step size offer a computationally efficient and statistically robust approach to obtaining point estimates or posterior-like samples.
翻译:在大规模学习和推论问题中,沙变梯度算法被广泛用于优化和取样,但在实践中,调整这些算法通常使用超自然和试和试和试,而不是严格、可概括的理论。为了解决理论和实践之间的这一差距,我们对调试参数的影响有了新的洞见,方法是将大规模抽样行为定性为具有固定步数的非常一般性的先验梯度算法;在优化设置中,我们的结果显示,以大固定步数平均流转可导致(当地)测算器在统计上高效近似。在取样中,我们的结果显示,如果对调算参数作出适当的选择,限制的常态变异性既可以与海脊的伯恩斯坦-沃米斯限制相匹配,也可以与模型误差的外表层差的调整,或者与MLELE方法的不耐受干扰的分布;如果对限度进行天性调,则可以导致(当地)测算(当地)测算器)测算。此外,我们说,在精确的测算方法中,我们用一个基本独立的定的定的测算方法来,通过精确的测算结果,我们通过一些定的测算结果,我们用定的测算的测算结果。