This paper develops an approach to inference in a linear regression model when the number of potential explanatory variables is larger than the sample size. The approach treats each regression coefficient in turn as the interest parameter, the remaining coefficients being nuisance parameters, and seeks an optimal interest-respecting transformation, inducing sparsity on the relevant blocks of the notional Fisher information matrix. The induced sparsity is exploited through a marginal least squares analysis for each variable, as in a factorial experiment, thereby avoiding penalization. One parameterization of the problem is found to be particularly convenient, both computationally and mathematically. In particular, it permits an analytic solution to the optimal transformation problem, facilitating theoretical analysis and comparison to other work. In contrast to regularized regression such as the lasso and its extensions, neither adjustment for selection nor rescaling of the explanatory variables is needed, ensuring the physical interpretation of regression coefficients is retained. Recommended usage is within a broader set of inferential statements, so as to reflect uncertainty over the model as well as over the parameters. The considerations involved in extending the work to other regression models are briefly discussed.
翻译:本文在潜在解释变量数量大于抽样规模的情况下,为线性回归模型的推论制定了一种方法。该方法将每个回归系数作为利息参数,其余的系数作为骚扰性参数处理,并寻求一种最优尊重利益的变化,在概念渔业信息矩阵的相关区块上引起聚变。引致的宽度通过对每个变量的边际最小方形分析加以利用,如在系数实验中那样,从而避免惩罚。发现问题的一个参数在计算和数学上特别方便。特别是,它允许对最佳转换问题作出分析性解决办法,便利理论分析,与其他工作进行比较。与定期回归相比,如拉索及其扩展,不需要为选择做出调整或调整解释性变量的缩放。建议使用在更广泛的一套推断性说明范围内,以便反映模型的不确定性和参数。将工作扩展到其他回归模型所涉及的考虑得到了简要讨论。