In Online Convex Optimization (OCO), when the stochastic gradient has a finite variance, many algorithms provably work and guarantee a sublinear regret. However, limited results are known if the gradient estimate has a heavy tail, i.e., the stochastic gradient only admits a finite $\mathsf{p}$-th central moment for some $\mathsf{p}\in\left(1,2\right]$. Motivated by it, this work examines different old algorithms for OCO (e.g., Online Gradient Descent) in the more challenging heavy-tailed setting. Under the standard bounded domain assumption, we establish new regrets for these classical methods without any algorithmic modification. Remarkably, these regret bounds are fully optimal in all parameters (can be achieved even without knowing $\mathsf{p}$), suggesting that OCO with heavy tails can be solved effectively without any extra operation (e.g., gradient clipping). Our new results have several applications. A particularly interesting one is the first provable and optimal convergence result for nonsmooth nonconvex optimization under heavy-tailed noise without gradient clipping. Furthermore, we explore broader settings (e.g., smooth OCO) and extend our ideas to optimistic algorithms to handle different cases simultaneously.
翻译:在在线凸优化(OCO)中,当随机梯度具有有限方差时,许多算法已被证明有效并能保证次线性遗憾。然而,若梯度估计具有重尾特性,即随机梯度仅对某个 $\mathsf{p}\in\left(1,2\right]$ 具有有限 $\mathsf{p}$ 阶中心矩,目前已知的结果有限。受此启发,本研究在更具挑战性的重尾设定下,重新审视了多种经典的OCO算法(例如在线梯度下降法)。在标准有界域假设下,我们为这些经典方法建立了无需任何算法修改的新型遗憾界。值得注意的是,这些遗憾界在所有参数上均完全最优(甚至可在未知 $\mathsf{p}$ 的情况下达到),表明重尾条件下的OCO问题无需任何额外操作(例如梯度裁剪)即可有效求解。我们的新结果具有多方面的应用价值,其中一个特别有趣的成果是首次在无需梯度裁剪的情况下,为重尾噪声下的非光滑非凸优化问题提供了可证明且最优的收敛性结果。此外,我们探索了更广泛的设定(例如光滑OCO),并将思路延伸至乐观算法以同时处理不同场景。