We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e.g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization. This includes widely used algorithms such as stochastic gradient descent (SGD) or Nesterov acceleration. The obtained equations match those resulting from the discretization of dynamical mean-field theory (DMFT) equations from statistical physics when applied to gradient flow. Our proof method allows us to give an explicit description of how memory kernels build up in the effective dynamics, and to include non-separable update functions, allowing datasets with non-identity covariance matrices. Finally, we provide numerical implementations of the equations for SGD with generic extensive batch-size and with constant learning rates.
翻译:我们证明,对于具有一阶梯度方法的大家庭,我们从对高斯数据观测中学习了以实验风险最小化的测算器(例如M-Sestimator、浅神经网络,......),这包括广泛使用的算法,如随机梯度梯度梯度下移或Nesterov加速等。获得的方程式与统计物理在对梯度流应用时将动态平均场理论(DMFT)方程式离散所产生的方程式相匹配。我们的证明方法使我们能够明确说明内存内核在有效动态中是如何形成的,并包括不可分离的更新功能,允许使用非身份共变矩阵数据集。最后,我们用通用的批量大小和不断学习率来提供SGD方程式的数字应用。