Tuning of stochastic gradient algorithms (SGAs) for optimization and sampling is often based on heuristics and trial-and-error rather than generalizable theory. We address this theory--practice gap by characterizing the statistical asymptotics of SGAs via a joint step-size--sample-size scaling limit. We show that iterate averaging with a large fixed step size is robust to the choice of tuning parameters and asymptotically has covariance proportional to that of the MLE sampling distribution. We also prove a Bernstein--von Mises-like theorem to guide tuning, including for generalized posteriors that are robust to model misspecification. Numerical experiments validate our results in realistic finite-sample regimes.
翻译:用于优化和取样的随机梯度算法(SGAs)的测试往往基于超常和试探和试探而不是可概括的理论。我们通过一个联合级定大小缩放限制来描述SGAs的统计零点特征,从而解决这一理论-实践差距。我们表明,使用大固定级数的超常平均对于选择调试参数是强有力的,而无常与MLE抽样分布成正比。我们也证明了一个类似于Bernstein-von Mises-sorem的理论来指导调控,包括对模型误差强健健健的通用后人。数字实验验证了我们在现实的有限抽样制度中的结果。