In this work we study high probability bounds for stochastic subgradient methods under heavy tailed noise. In this case the noise is only assumed to have finite variance as opposed to a sub-Gaussian distribution for which it is known that standard subgradient methods enjoys high probability bounds. We analyzed a clipped version of the projected stochastic subgradient method, where subgradient estimates are truncated whenever they have large norms. We show that this clipping strategy leads both to near optimal any-time and finite horizon bounds for many classical averaging schemes. Preliminary experiments are shown to support the validity of the method.
翻译:在这项工作中,我们研究了在重尾尾噪声下采用随机亚梯度方法的高概率界限,在此情况下,噪音仅假定具有有限差异,而不是亚加盟分布,因为已知标准亚梯度方法具有很高的概率界限。我们分析了预测的随机子梯度亚梯度方法的剪辑版本,即当亚梯度估计值具有大规范时,就会被截断。我们显示,这种剪切战略使许多古典平均率方案几乎达到任何时间和有限地平线的最佳界限。我们展示了初步实验,以支持该方法的有效性。