Stochastic gradient descent (SGD) has been shown to generalize well in many deep learning applications. In practice, one often runs SGD with a geometrically decaying stepsize, i.e., a constant initial stepsize followed by multiple geometric stepsize decay, and uses the last iterate as the output. This kind of SGD is known to be nearly minimax optimal for classical finite-dimensional linear regression problems (Ge et al., 2019). However, a sharp analysis for the last iterate of SGD in the overparameterized setting is still open. In this paper, we provide a problem-dependent analysis on the last iterate risk bounds of SGD with decaying stepsize, for (overparameterized) linear regression problems. In particular, for last iterate SGD with (tail) geometrically decaying stepsize, we prove nearly matching upper and lower bounds on the excess risk. Moreover, we provide an excess risk lower bound for last iterate SGD with polynomially decaying stepsize and demonstrate the advantage of geometrically decaying stepsize in an instance-wise manner, which complements the minimax rate comparison made in prior works.
翻译:在实践上,人们经常对SGD进行几何衰减级步骤化,即:一个固定的初始步骤,然后是多个几何步骤衰减,然后是多个几何步骤衰减,并使用最后一个迭代作为输出。这种SGD已知对于传统的有限维线性回归问题来说几乎是微缩式最佳(Ge等人,2019年),然而,对超分化设置中SGD最后一次迭代的精确分析仍然开放。在本文中,我们对SGD与衰减步骤化的最后一个迭代风险范围进行基于问题的分析,即:对于(过分分化的)线性回归问题。特别是,对于SGDD与(尾)几何衰减步骤化的最后一次迭代,我们证明我们几乎与超重风险的上限和下限相匹配。此外,我们对SGDD在超分化设置中的最后一次迭代SGDD与多元化步骤化的超低约束度风险。在本文中,我们提供了对SGDDD与微缩缩缩缩的优势分析。