The noise in stochastic gradient descent (SGD), caused by minibatch sampling, remains poorly understood despite its enormous practical importance in offering good training efficiency and generalization ability. In this work, we study the minibatch noise in SGD. Motivated by the observation that minibatch sampling does not always cause a fluctuation, we set out to find the conditions that cause minibatch noise to emerge. We first derive the analytically solvable results for linear regression under various settings, which are compared to the commonly used approximations that are used to understand SGD noise. We show that some degree of mismatch between model and data complexity is needed in order for SGD to "cause" a noise, and that such mismatch may be due to the existence of static noise in the labels, in the input, the use of regularization, or underparametrization. Our results motivate a more accurate general formulation to describe minibatch noise.
翻译:小型批量取样导致的悬浮梯度下降的噪音尽管在提供良好的培训效率和一般化能力方面具有巨大的实际重要性,但仍然没有得到很好的理解。在这项工作中,我们研究了SGD中的微型批量噪音。由于观察到微型批量取样并不总是引起波动,我们开始寻找导致微型批量噪音出现的条件。我们首先从分析上得出在不同环境下线性回归的可溶性结果,这些结果与通常用来理解SGD噪音的近似值相比。我们表明,模型和数据复杂性之间需要某种程度的不匹配,才能使SGD“造成”一种噪音,而这种不匹配可能是由于标签、投入、正规化的使用或对称不足中存在静态噪音。我们的结果鼓励一种更准确的通用配方来描述微型批量噪音。