假设 -- -- 利差变异意义测试的预计差异因数措施 (The Projected Covariance Measure for assumption-lean variable significance testing)

Testing the significance of a variable or group of variables $X$ for predicting a response $Y$, given additional covariates $Z$, is a ubiquitous task in statistics. A simple but common approach is to specify a linear model, and then test whether the regression coefficient for $X$ is non-zero. However, when the model is misspecified, the test may have poor power, for example when $X$ is involved in complex interactions, or lead to many false rejections. In this work we study the problem of testing the model-free null of conditional mean independence, i.e. that the conditional mean of $Y$ given $X$ and $Z$ does not depend on $X$. We propose a simple and general framework that can leverage flexible nonparametric or machine learning methods, such as additive models or random forests, to yield both robust error control and high power. The procedure involves using these methods to perform regressions, first to estimate a form of projection of $Y$ on $X$ and $Z$ using one half of the data, and then to estimate the expected conditional covariance between this projection and $Y$ on the remaining half of the data. While the approach is general, we show that a version of our procedure using spline regression achieves what we show is the minimax optimal rate in this nonparametric testing problem. Numerical experiments demonstrate the effectiveness of our approach both in terms of maintaining Type I error control, and power, compared to several existing approaches.

翻译：在一个变量或一组变量(X美元)中测试一个变数或一组变数(X美元)对于预测一个应答美元(美元)的重要性,如果有额外的共差值(Z美元),则在统计中是一项无处不在的任务。一个简单但共同的方法是指定线性模型,然后测试美元回归系数是否为非零。然而,如果模型定义错误,测试的功率可能较差,例如当X美元涉及复杂的相互作用或导致许多虚假拒绝时。在这项工作中,我们研究的是测试无型和无型的无条件平均独立(即提供X美元和Z美元的有条件平均值并不取决于X美元)的问题。我们提出的一个简单和一般的框架可以利用灵活的非参数或机器学习系数(如添加模型或随机森林)来产生稳健的误控和高功率。程序涉及使用这些方法进行回归,首先用数据中的一半估算美元对美元和Z$的预测,然后用数据来估计预期的无型差差率值,同时用美元来显示我们一般的逆差率,同时显示我们一般的正式的正数测试程序。