Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging variance, which hinders the use of conventional convergence analysis techniques that rely on the existence of the second-order moments. In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives. In the case where the $p$-th moment of the noise exists for some $p\in [1,2)$, we first identify a condition on the Hessian, coined '$p$-positive (semi-)definiteness', that leads to an interesting interpolation between positive semi-definite matrices ($p=2$) and diagonally dominant matrices with non-negative diagonal entries ($p=1$). Under this condition, we then provide a convergence rate for the distance to the global optimum in $L^p$. Furthermore, we provide a generalized central limit theorem, which shows that the properly scaled Polyak-Ruppert averaging converges weakly to a multivariate $\alpha$-stable random vector. Our results indicate that even under heavy-tailed noise with infinite variance, SGD can converge to the global optimum without necessitating any modification neither to the loss function or to the algorithm itself, as typically required in robust statistics. We demonstrate the implications of our results to applications such as linear regression and generalized linear models subject to heavy-tailed data.
翻译:最近的研究提供了经验证据和理论证据,表明在各种情景中,重尾可以出现在随机梯度下降(SGD)中的重尾,这种重尾有可能导致反复出现差异,这有碍于使用依赖第二阶点存在的常规趋同分析技术。在本文件中,我们为基于状态和重尾噪音的SGD提供了趋同保证,可能存在无限差异,用于一个强烈的锥形目标类别。在这种情况下,一些美元[1,2]的噪音存在第一刻,我们首先确定Hessian的一个条件,即“美元-正(sem-sem-definite)”的反复出现差异,这有碍于使用取决于存在第二阶点时的常规趋同分析技术。我们为SGDGD提供了具有潜在差异且可能无限差异的组合组合。在此情况下,我们为全球最接近最接近于美元($1,2美元)的距离提供了一种趋同值的趋同速度速度速度。此外,我们提供了一种“美元-ral-ral-ralalalal ”的递归正值,显示我们最接近的正值的正值值值值值值值值值值值值值值值值值值值值值值,显示下,显示为正值值的递归正值值的递增值值值值的递归为正值。