Recent extensive numerical experiments in high scale machine learning have allowed to uncover a quite counterintuitive phase transition, as a function of the ratio between the sample size and the number of parameters in the model. As the number of parameters $p$ approaches the sample size $n$, the generalisation error (a.k.a. testing error) increases, but it many cases, it starts decreasing again past the threshold $p=n$. This surprising phenomenon, brought to the theoretical community attention in \cite{belkin2019reconciling}, has been thorougly investigated lately, more specifically for simpler models than deep neural networks, such as the linear model when the parameter is taken to be the minimum norm solution to the least-square problem, mostly in the asymptotic regime when $p$ and $n$ tend to $+\infty$; see e.g. \cite{hastie2019surprises}. In the present paper, we propose a finite sample analysis of non-linear models of \textit{ridge} type, where we investigate the double descent phenomenon for both the \textit{estimation problem} and the prediction problem. Our results show that the double descent phenomenon can be precisely demonstrated in non-linear settings and complements recent works of \cite{bartlett2020benign} and \cite{chinot2020benign}. Our analysis is based on efficient but elementary tools closely related to the continuous Newton method \cite{neuberger2007continuous}.
翻译:在高规模机器学习中,最近广泛的数字实验发现了一个相当反直观的阶段过渡,这是样本大小和模型参数数量之间比例的函数。随着参数数量接近样本大小的美元,一般化错误(a.k.a.测试错误)增加,但很多情况下,它又开始再次超过阈值$p=n美元。在\cite{belkin2019Reconcil}中,这种令人惊讶的现象引起了理论界的注意。最近,对比深层神经网络更简单的模型进行了认真调查,例如线性模型,当参数被视为最起码解决最不平面问题的规范方法时,参数的数量就会增加,当美元和美元倾向于美元时,一般的错误(a.k.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a.a