For testing conditional independence (CI) of a response Y and a predictor X given covariates Z, the recently introduced model-X (MX) framework has been the subject of active methodological research, especially in the context of MX knockoffs and their successful application to genome-wide association studies. In this paper, we study the power of MX CI tests, yielding quantitative explanations for empirically observed phenomena and novel insights to guide the design of MX methodology. We show that any valid MX CI test must also be valid conditionally on Y and Z; this conditioning allows us to reformulate the problem as testing a point null hypothesis involving the conditional distribution of X. The Neyman-Pearson lemma then implies that the conditional randomization test (CRT) based on a likelihood statistic is the most powerful MX CI test against a point alternative. We also obtain a related optimality result for MX knockoffs. Switching to an asymptotic framework with arbitrarily growing covariate dimension, we derive an expression for the limiting power of the CRT against local semiparametric alternatives in terms of the prediction error of the machine learning algorithm on which its test statistic is based. Finally, we exhibit a resampling-free test with uniform asymptotic Type-I error control under the assumption that only the first two moments of X given Z are known, a significant relaxation of the MX assumption.
翻译:对于测试答复Y和预测者X的有条件独立性(CI)测试,最近推出的模型-X(MX)框架一直是积极的方法研究的主题,特别是在MX淘汰及其成功应用于整个基因组协会研究的背景下,最近推出的模型-X(MX)框架一直是积极的方法研究的主题。在本文中,我们研究了MXCI测试的力量,对经验观察到的现象提出了定量解释,并提出了指导MX方法设计的新见解。我们显示,任何有效的MXCI测试也必须在Y和Z的有条件条件下有效;这一条件使我们能够重新界定问题,以测试一个与X有条件分布有关的完全的假设点。 Neyman-Pearson Lemma(Neyman-Pearson Lemma)随后意味着,基于可能的统计数字的有条件随机化测试(CRT)是针对某个点的最为有力的MXCI测试。我们还获得了一个相关的最佳结果,以指导MX方法的设计。我们从任意增长的变异性层面转向一个简单框架,我们提出了限制CRT对当地半参数替代方法的替代方法,以X自由分配的预测误差值为X。根据可能统计测测测测测算的模型,最终以机测算为我们所测测测测测测得的模型。