In this paper we give a completely new approach to the problem of covariate selection in linear regression. A covariate or a set of covariates is included only if it is better in the sense of least squares than the same number of Gaussian covariates consisting of i.i.d. $N(0,1)$ random variables. The Gaussian P-value is defined as the probability that the Gaussian covariates are better. It is given in terms of the Beta distribution, it is exact and it holds for all data making it model-free free. The covariate selection procedures require only a cut-off value $\alpha$ for the Gaussian P-value: the default value in this paper is $\alpha=0.01$. The resulting procedures are very simple, very fast, do not overfit and require only least squares. In particular there is no regularization parameter, no data splitting, no use of simulations, no shrinkage and no post selection inference is required. The paper includes the results of simulations, applications to real data sets and theorems on the asymptotic behaviour under the standard linear model. Here the step-wise procedure performs overwhelmingly better than any other procedure we are aware of. An R-package {\it gausscov} is available.
翻译:在本文中,我们给出了一种全新的方法来解决线性回归中的共变选择问题。 共变或一组共变只有在以下情况下才包含: 共变选择程序在最小正方值的意义上比由 i. i. d. $N. (0, 1美元) 组成的相同高斯方变数的数值要好。 高斯P值被定义为高斯方变数更好的可能性。 特别是没有规范化参数, 没有数据分割, 没有模拟, 没有缩写, 没有后选需要。 共变选择程序只要求高斯方P值的截取值$\ alpha$: 本文中的默认值是$\ alpha= 0.01美元。 由此产生的程序非常简单, 非常快速, 不过分, 只需要最小方块。 特别是没有常规化参数, 没有数据分割, 没有使用模拟, 没有缩写和后选。 本文包含模拟的结果, 用于真实数据集的应用程序和在真实的阶梯程序下, 最能理解的阶梯式行为比我们更能了解的任何标准。 。 。 在正态程序下, 正在进行中进行 。