We analyze the prediction error of ridge regression in an asymptotic regime where the sample size and dimension go to infinity at a proportional rate. In particular, we consider the role played by the structure of the true regression parameter. We observe that the case of a general deterministic parameter can be reduced to the case of a random parameter from a structured prior. The latter assumption is a natural adaptation of classic smoothness assumptions in nonparametric regression, which are known as source conditions in the the context of regularization theory for inverse problems. Roughly speaking, we assume the large coefficients of the parameter are in correspondence to the principal components. In this setting a precise characterisation of the test error is obtained, depending on the inputs covariance and regression parameter structure. We illustrate this characterisation in a simplified setting to investigate the influence of the true parameter on optimal regularisation for overparameterized models. We show that interpolation (no regularisation) can be optimal even with bounded signal-to-noise ratio (SNR), provided that the parameter coefficients are larger on high-variance directions of the data, corresponding to a more regular function than posited by the regularization term. This contrasts with previous work considering ridge regression with isotropic prior, in which case interpolation is only optimal in the limit of infinite SNR.
翻译:我们在一个无症状论的体系中分析山脊回归的预测错误,在这种体系中,抽样大小和尺寸以比例率走向无限度。我们特别考虑真实回归参数结构的作用。我们观察到,一般确定参数的情况可以从结构化的先前参数中降为随机参数的情况。后一种假设是非参数回归中典型的平稳假设的自然调整,在非参数回归中,这种假设被称为对反问题进行规范化理论的源条件。粗略地说,我们假设该参数的参数的较大系数与主要组成部分的对应。在设置测试错误的精确特征时,要视输入的常变和回归参数结构而定。我们用简化的设置来调查真实参数对超分度模型最佳正规化的影响。我们表明,即使结合了信号对噪音比率(SNRR),但参数在数据高变量方向上的系数要大于主要组成部分。在设定测试错误的精确特征时,取决于输入的变量和回归参数结构结构结构结构结构。我们用简化的设置来调查真实参数对参数的影响。我们表明,即使结合信号对数值比对结果的精确度的精确度,但只与先前的正值的正值的正值是考虑对正值的比。