This work studies finite-sample properties of the risk of the minimum-norm interpolating predictor in high-dimensional regression models. If the effective rank of the covariance matrix $\Sigma$ of the $p$ regression features is much larger than the sample size $n$, we show that the min-norm interpolating predictor is not desirable, as its risk approaches the risk of trivially predicting the response by 0. However, our detailed finite-sample analysis reveals, surprisingly, that this behavior is not present when the regression response and the features are {\it jointly} low-dimensional, following a widely used factor regression model. Within this popular model class, and when the effective rank of $\Sigma$ is smaller than $n$, while still allowing for $p \gg n$, both the bias and the variance terms of the excess risk can be controlled, and the risk of the minimum-norm interpolating predictor approaches optimal benchmarks. Moreover, through a detailed analysis of the bias term, we exhibit model classes under which our upper bound on the excess risk approaches zero, while the corresponding upper bound in the recent work arXiv:1906.11300 diverges. Furthermore, we show that the minimum-norm interpolating predictor analyzed under the factor regression model, despite being model-agnostic and devoid of tuning parameters, can have similar risk to predictors based on principal components regression and ridge regression, and can improve over LASSO based predictors, in the high-dimensional regime.
翻译:这项工作研究的是,在高维回归模型中,最小中温内插预测值风险的有限抽样性质。如果共差矩阵的有效等级比样本规模大得多,美元回归特征的美元美元正方格美值有效,那么我们就会发现,最小中位内插预测值并不可取,因为其风险接近于微小地预测反应0这一风险的风险。然而,我们详细的有限抽样分析显示,当回归反应和特征按照广泛使用的因素回归模型的低维度,低维度时,该行为并不存在。在这个流行模型类别中,当美元正方格美值的有效等级小于样本规模的美元,同时,当中位内插预测值的偏差值和差异值都比不上0.;然而,我们通过对偏差的预测值进行详细分析,我们展示的模型类别,在超重风险模型的高度上下,我们展示的模型类别,在最大风险模型的模型中,在最差的预测值中则显示,在最低的轨道下,在最低的轨道下,在最低的轨道上,在最差的轨道上,在最低的轨道上,在最低的轨道上,在最低的轨道上,在最低的轨道上,在最低的轨道上,在最低的轨道上,在最低的轨道上,在最低的轨道上,在最短的轨道上,在最短的轨道上,在最短的轨道上,在最短的轨道上,在最短的轨道上,在10的轨道上。