During recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimization methods has been growing. One of the main reasons for this is that high-probability complexity bounds are more accurate and less studied than in-expectation ones. However, SOTA high-probability non-asymptotic convergence results are derived under strong assumptions such as the boundedness of the gradient noise variance or of the objective's gradient itself. In this paper, we propose several algorithms with high-probability convergence results under less restrictive assumptions. In particular, we derive new high-probability convergence results under the assumption that the gradient/operator noise has bounded central $\alpha$-th moment for $\alpha \in (1,2]$ in the following setups: (i) smooth non-convex / Polyak-Lojasiewicz / convex / strongly convex / quasi-strongly convex minimization problems, (ii) Lipschitz / star-cocoercive and monotone / quasi-strongly monotone variational inequalities. These results justify the usage of the considered methods for solving problems that do not fit standard functional classes studied in stochastic optimization.
翻译:近年来,优化和机器学习界对高概率集中优化优化方法的兴趣日益增长,其主要原因之一是,高概率复杂度的界限比预期的界限更准确,研究较少;然而,SOTA高概率非无害性趋同结果是在诸如梯度噪声差异或目标梯度本身的界限等强有力的假设下产生的。在本文件中,我们提出了几种具有高概率趋同结果的算法,在限制较少的假设下,我们得出了新的高概率趋同结果。特别是,我们得出新的高概率趋同结果,所依据的假设是,梯度/操作器噪音已经将美元/阿尔法(1,2美元)的中间值定为第1,但在随后的设置中,SOTA(一)高概率非偶然性非便利性非便利性杂音/Polica-Lojasiewicz/convex /强型螺旋弦轴/准坚硬性共性最小性最小性最小化。我们提出的几种算法,特别是,即,在以下假设下,梯度/恒度-cocioncive-cioncion-certcertive 和单质最佳最佳化优化使用方法中,在研究研究的功能性标准级中,可以解释性最佳处理结果。