We design step-size schemes that make stochastic gradient descent (SGD) adaptive to (i) the noise $\sigma^2$ in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth, strongly-convex functions with condition number $\kappa$, we first prove that $T$ iterations of SGD with Nesterov acceleration and exponentially decreasing step-sizes can achieve a near-optimal $\tilde{O}(\exp(-T/\sqrt{\kappa}) + \sigma^2/T)$ convergence rate. Under a relaxed assumption on the noise, with the same step-size scheme and knowledge of the smoothness, we prove that SGD can achieve an $\tilde{O}(\exp(-T/\kappa) + \sigma^2/T)$ rate. In order to be adaptive to the smoothness, we use a stochastic line-search (SLS) and show (via upper and lower-bounds) that SGD converges at the desired rate, but only to a neighbourhood of the solution. Next, we use SGD with an offline estimate of the smoothness and prove convergence to the minimizer. However, its convergence is slowed down proportional to the estimation error and we prove a lower-bound justifying this slowdown. Compared to other step-size schemes, we empirically demonstrate the effectiveness of exponential step-sizes coupled with a novel variant of SLS.
翻译:我们设计步骤规模计划,使随机梯度下降(SGD)适应(一) 随机梯度梯度梯度的噪声 $gma=2美元,和(二) 问题依赖的常数。当以条件数$kappa美元将平滑、强的混凝土功能最小化时,我们首先证明SGD能够以Nesterov加速和指数递减的步数来达到接近最佳的 $tilde{O}(exp (-T/\ sqrt kapa}) +\ sigma2/2/T) 的趋同率。根据对噪音的宽松假设,加上同样的步数方案和对平滑度的了解,我们证明SGD可以达到$tilde{O} (\ Explic (-T/\\ kapppa) +\\\ gmag2/T) 的递减率。为了适应平滑的平滑度,我们用SLS(SS) 和下行的缩缩缩度递增的步数,显示SGDGDLB 的比值的下行的滑度比值,我们只能以更低的平比值的平整比值显示我们更接近的平比值的平比值, 。