The noise in stochastic gradient descent (SGD), caused by minibatch sampling, is poorly understood despite its practical importance in deep learning. In this work, we study the nature of SGD noise and fluctuation. We show that some degree of mismatch between model and data complexity is needed for SGD to ``stir" a noise; such mismatch may be due to a label or input noise, regularization, or underparametrization. Compared with previous works, the present work focuses on deriving exactly solvable analytical results. Our work also motivates a more accurate general formulation to describe minibatch noise, and we show that the SGD noise takes different shapes and strengths in different kinds of minima.
翻译:微小批量采样引发的悬浮梯度下降(SGD)中的噪音尽管在深层学习中具有实际重要性,但人们对此了解甚少。在这项工作中,我们研究了SGD噪音和波动的性质。我们表明,SGD需要某种程度的模型和数据复杂性不匹配,这种不匹配可能是由于标签或输入噪音、正规化或不对称。与以往的工程相比,目前的工作侧重于得出完全可以溶解的分析结果。我们的工作还激励一种更精确的通用配方来描述微小批量噪音,我们显示SGD噪音在不同类型的小型工程中具有不同的形状和力量。