In this paper we give a completely new approach to the problem of covariate selection in linear regression. A covariate or a set of covariates is included only if it is better in the sense of least squares than the same number of Gaussian covariates consisting of i.i.d. $N(0,1)$ random variables. The Gaussian P-value is defined as the probability that the Gaussian covariates are better. It is given in terms of the Beta distribution, it is exact and it holds for all data. The covariate selection procedures based on this require only a cut-off value $\alpha$ for the Gaussian P-value: the default value in this paper is $\alpha=0.01$. The resulting procedures are very simple, very fast, do not overfit and require only least squares. In particular there is no regularization parameter, no data splitting, no use of simulations, no shrinkage and no post selection inference is required. The paper includes the results of simulations, applications to real data sets and theorems on the asymptotic behaviour under the standard linear model. Here the stepwise procedure performs overwhelmingly better than any other procedure we are aware of. An R-package {\it gausscov} is available.
翻译:在本文中,我们给出了一种全新的方法来解决线性回归中的共变选择问题。 只有在以下情况下才包含共变或一组共变项: 共变选择程序在最小正方值的意义上比由 i. i. d. $N. (0, 1美元) 组成的相同数字高斯方变数的数值更好。 高斯P值被定义为高斯方变数更好的可能性。 它以Beta 分布为单位, 准确无误, 对所有数据都有保留。 基于此点的共变选择程序只需要高斯P值的截断值$\ alpha$: 此文件中的默认值为$\ alpha= 0.01$。 由此产生的程序非常简单, 非常快速, 不过分, 只需要最小的正方块。 特别是没有规范参数, 没有数据分割, 没有模拟, 没有缩放, 没有后选和后选择。 该文件包含模拟的结果, 真实的数据集应用和在真实的模型中位化程序上, 并且以最高的方式进行。 。