Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum $\ell_2$ norm (``ridgeless'') interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors $x_i \in {\mathbb R}^p$ are obtained by applying a linear transform to a vector of i.i.d.\ entries, $x_i = \Sigma^{1/2} z_i$ (with $z_i \in {\mathbb R}^p$); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, $x_i = \varphi(W z_i)$ (with $z_i \in {\mathbb R}^d$, $W \in {\mathbb R}^{p \times d}$ a matrix of i.i.d.\ entries, and $\varphi$ an activation function acting componentwise on $W z_i$). We recover -- in a precise quantitative way -- several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.
翻译:实现零培训错误的测算器 -- -- 达到零培训误差的测算器 -- -- 已经在机器学习中引起越来越多的注意,这主要是因为最先进的神经网络似乎是这种类型的模型。在本文中,我们研究高维最小平方回归中最小值的内插值(“无脊椎” ) 。我们考虑特征分布的两个不同模型:一个线性模型,通过对i.i.d. 条目的矢量进行线性变换,从而获得功能矢量 $x_i\in\ mathbbr R ⁇ %2} z_i$(美元_i=Sigma_i=Sigma_1/2} z_i 美元(美元=x_2美元=x_2美元/美元) 美元;一个非线性模型,通过随机的单层神经网络传输输入的特性矢量, $x_i = varphi(美元), 以 $x_i =xxxx=xx 数量 运算算法, 比例 成本 。