The study on the implicit regularization induced by gradient-based optimization is a longstanding pursuit. In the present paper, we characterize the implicit regularization of momentum gradient descent (MGD) with early stopping by comparing with the explicit $\ell_2$-regularization (ridge). In details, we study MGD in the continuous-time view, so-called momentum gradient flow (MGF), and show that its tendency is closer to ridge than the gradient descent (GD) [Ali et al., 2019] for least squares regression. Moreover, we prove that, under the calibration $t=\sqrt{2/\lambda}$, where $t$ is the time parameter in MGF and $\lambda$ is the tuning parameter in ridge regression, the risk of MGF is no more than 1.54 times that of ridge. In particular, the relative Bayes risk of MGF to ridge is between 1 and 1.035 under the optimal tuning. The numerical experiments support our theoretical results strongly.
翻译:关于以梯度为基础的优化引起的隐性正规化的研究是一项长期的追求。在本文件中,我们将动力梯度下降(MGD)的隐性正规化定性化定性为早期停止,与明确的 $ ell_2$-正规化(Ridge)进行比较。在细节上,我们从连续时间的角度研究MGD,即所谓的动力梯度流动(MGF),并表明其趋势比梯度下降(GD)更接近峰值[Ali等人,20199],以最小平方度回归。此外,我们证明,根据校准 $@sqrt{2/\lambda}$($t$是MGF的时间参数)和 $\lambda$($)是峰值回归的调节参数,MGF值的风险不超过峰值的1.54倍。特别是MGF到峰值的相对风险在最佳调整下在1至1.035之间。数字实验有力地支持了我们的理论结果。