We develop an approach to inference in a linear regression model when the number of potential explanatory variables is larger than the sample size. Our approach treats each regression coefficient in turn as the interest parameter, the remaining coefficients being nuisance parameters, and seeks an optimal interest-respecting transformation. The role of this transformation is to allow a marginal least squares analysis for each variable, as in a factorial experiment. One parameterization of the problem is found to be particularly convenient, both computationally and mathematically. In particular, it permits an analytic solution to the optimal transformation problem, facilitating comparison to other work. In contrast to regularized regression such as the lasso (Tibshirani, 1996) and its extensions, neither adjustment for selection, nor rescaling of the explanatory variables is needed, ensuring the physical interpretation of regression coefficients is retained. We discuss the use of such confidence intervals as part of a broader set of inferential statements, so as to reflect uncertainty over the model as well as over the parameters. The considerations involved in extending the work to other regression models are briefly discussed.
翻译:当潜在解释变量的数量大于抽样规模时,我们制定线性回归模型的推算方法。我们的方法将每个回归系数作为利息参数,剩下的系数是扰动参数,并寻求一种尊重利益的最佳转变。这种转变的作用是允许对每个变量进行边际最小平方分析,就像在因数实验中那样。发现在计算和数学上,问题的一个参数特别方便。特别是,它允许对最佳转型问题作出分析性解决办法,便于与其他工作进行比较。与拉索(Tibshirani,1996年)等常规回归参数及其扩展相比,既不需要选择调整,也不需要调整解释变量,以确保对回归系数的物理解释。我们讨论使用这种信任间隔作为更广泛的一套推论的一部分,以便反映模型的不确定性和参数。将工作扩大到其他回归模型所涉及的考虑得到了简要讨论。