近于最佳高概率高概率复杂度 (Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise)

Stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth stochastic convex optimization have complexity bounds with the dependence on the confidence level that is either negative-power or logarithmic but under an additional assumption of sub-Gaussian (light-tailed) noise distribution that may not hold in practice. In our paper, we resolve this issue and derive the first high-probability convergence results with logarithmic dependence on the confidence level for non-smooth convex stochastic optimization problems with non-sub-Gaussian (heavy-tailed) noise. To derive our results, we propose novel stepsize rules for two stochastic methods with gradient clipping. Moreover, our analysis works for generalized smooth objectives with H\"older-continuous gradients, and for both methods, we provide an extension for strongly convex problems. Finally, our results imply that the first (accelerated) method we consider also has optimal iteration and oracle complexity in all the regimes, and the second one is optimal in the non-smooth setting.

翻译：随机行为可能导致某种特定的算法运行,导致高度低于最优化的目标值,而理论保障通常被证明为对目标值的预期。因此,在理论上保证算法提供低目标剩余,极有可能提供极小的目标。现有的非超光速相交融优化方法具有复杂性,取决于信任度水平,这种信任水平要么是负能量,要么是对数,但又是一个可能无法维持的亚加西(轻尾)噪声分布的附加假设。在我们的文件里,我们解决这个问题并得出第一个高概率趋同结果,因为对非超光速同流相交配法的可信度水平的逻辑依赖性是高概率的。非超光速相交融优化现有方法与非超光速(重尾尾尾尾尾尾尾尾尾尾尾)的噪音都有复杂性问题。为了得出我们的结果,我们提出了两种具有梯度剪切法的系统的新渐进规则。此外,我们用“老”和“不易变异度”的音频度分布进行普遍平稳目标的分析工作,并得出了一种最优性的方法,我们最后将它置于最优性的方法。