We propose a new stochastic gradient method that uses recorded past loss values to reduce the variance. Our method can be interpreted as a new stochastic variant of the Polyak Stepsize that converges globally without assuming interpolation. Our method introduces auxiliary variables, one for each data point, that track the loss value for each data point. We provide a global convergence theory for our method by showing that it can be interpreted as a special variant of online SGD. The new method only stores a single scalar per data point, opening up new applications for variance reduction where memory is the bottleneck.
翻译:我们提出了一个新的随机梯度方法,使用过去记录的损失值来减少差异。我们的方法可以被解释为一种新的多功能梯度变体,在不假定内推的情况下,将这种变体聚集到全球。我们的方法引入了辅助变量,每个数据点各一个,跟踪每个数据点的损失值。我们为我们的方法提供了一种全球趋同理论,表明它可以被解释为在线 SGD 的特殊变体。新方法只存储了每个数据点的单标值,在内存为瓶颈的地方打开了减少差异的新应用程序。