In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over diagonal linear networks. We prove the convergence of GD and SGD with macroscopic stepsizes in an overparametrised regression setting and characterise their solutions through an implicit regularisation problem. Our crisp characterisation leads to qualitative insights about the impact of stochasticity and stepsizes on the recovered solution. Specifically, we show that large stepsizes consistently benefit SGD for sparse regression problems, while they can hinder the recovery of sparse solutions for GD. These effects are magnified for stepsizes in a tight window just below the divergence threshold, in the ``edge of stability'' regime. Our findings are supported by experimental results.
翻译:在本文中,我们调查了悬浮性和大步骤对梯度下降和悬浮梯度梯度下降的隐含常规化的影响,我们证明了GD和SGD与宏观回归的趋同性步骤在过度分裂的回归环境中的趋同,并通过隐含的常规化问题来描述其解决办法。我们的精确特征使得人们能够从质量上洞察到悬浮和逐步化对回收的解决方案的影响。具体地说,我们表明,大型步骤不断使SGD有利于稀疏回归问题,同时它们可能阻碍为GD恢复稀疏的解决方案。这些效应被放大,是因为在紧紧的窗口中,紧贴在离差的临界点,即“稳定制度的对立”中,我们的调查结果得到了实验结果的支持。