Recently, several studies consider the stochastic optimization problem but in a heavy-tailed noise regime, i.e., the difference between the stochastic gradient and the true gradient is assumed to have a finite $p$-th moment (say being upper bounded by $\sigma^{p}$ for some $\sigma\geq0$) where $p\in(1,2]$, which not only generalizes the traditional finite variance assumption ($p=2$) but also has been observed in practice for several different tasks. Under this challenging assumption, lots of new progress has been made for either convex or nonconvex problems, however, most of which only consider smooth objectives. In contrast, people have not fully explored and well understood this problem when functions are nonsmooth. This paper aims to fill this crucial gap by providing a comprehensive analysis of stochastic nonsmooth convex optimization with heavy-tailed noises. We revisit a simple clipping-based algorithm, whereas, which is only proved to converge in expectation but under the additional strong convexity assumption. Under appropriate choices of parameters, for both convex and strongly convex functions, we not only establish the first high-probability rates but also give refined in-expectation bounds compared with existing works. Remarkably, all of our results are optimal (or nearly optimal up to logarithmic factors) with respect to the time horizon $T$ even when $T$ is unknown in advance. Additionally, we show how to make the algorithm parameter-free with respect to $\sigma$, in other words, the algorithm can still guarantee convergence without any prior knowledge of $\sigma$.
翻译:带重尾噪声的随机非光滑凸优化
Translated abstract:
本文讨论了带有重尾噪声的随机非光滑凸优化问题。该噪声假设在随机梯度和真实梯度之间存在有限的$p$阶矩(即存在某个$\sigma \geq 0$使得它被$p$次方上界$\sigma^p$限制),$p \in (1,2]$,这不仅推广了传统的有限方差假设($p=2$),而且在多个不同任务中的实践中也被观察到。针对这一具有挑战性的假设,本文提供了随机非光滑凸优化的全面分析。我们重新研究了一种简单的截断-based算法,虽然该算法仅在强凸性的附加假设下被证明收敛,但是,通过适当的参数选择,对于凸函数和强凸函数,我们不仅建立了首个高概率收敛速率,而且相对于现有工作给出了更精细的预期界限。值得注意的是,即使$T$事先是未知的,我们所有的结果都是最优的(或者近似最优的,至多有对数因子)。此外,我们展示了如何使算法在$\sigma$方面成为无参量的,换句话说,该算法可以在没有任何先验知识的情况下保证收敛。