Motivated by the recent successes of neural networks that have the ability to fit the data perfectly and generalize well, we study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor fits perfectly inputs and outputs $\langle \theta_* , \phi(X) \rangle = Y$, where $\phi(X)$ stands for a possibly infinite dimensional non-linear feature map. To solve this problem, we consider the estimator given by the last iterate of stochastic gradient descent (SGD) with constant step-size. In this context, our contribution is two fold: (i) from a (stochastic) optimization perspective, we exhibit an archetypal problem where we can show explicitly the convergence of SGD final iterate for a non-strongly convex problem with constant step-size whereas usual results use some form of average and (ii) from a statistical perspective, we give explicit non-asymptotic convergence rates in the over-parameterized setting and leverage a fine-grained parameterization of the problem to exhibit polynomial rates that can be faster than $O(1/T)$. The link with reproducing kernel Hilbert spaces is established.
翻译:受最近能够完美和全面地匹配数据的神经网络的成功激励,我们研究了基本最小平方结构中的无噪音模型。我们假设一个最佳预测器完全适合输入和输出$\langle\theta ⁇,\phi(X)\rangle=Y$,$\phi(X)\rangle=Y$,其中$\phi(X)$代表一个可能无限的维度非线性地貌图。为了解决这个问题,我们从统计角度来考虑最后一个迭代的随机梯度梯度下降(SGD)给出的测量器。在这方面,我们的贡献是两个折叠:(i) 从(stochetic)优化的角度,我们展示了一个拱形问题,我们可以明确显示SGD最终的螺旋值与非强性渐变型螺旋问题趋同,而通常的结果则使用某种平均和(ii)的形式。我们从统计角度,我们给出了在过度校准定的定基底基底基底基底基底定和杠杆化的摩擦趋近率。我们把SLI1/Hilneteltal-rocalimalimatealizedal-latexegelationald the the