For testing conditional independence (CI) of a response Y and a predictor X given covariates Z, the recently introduced model-X (MX) framework has been the subject of active methodological research, especially in the context of MX knockoffs and their successful application to genome-wide association studies. In this paper, we study the power of MX CI tests, yielding quantitative insights into the role of machine learning and providing evidence in favor of using likelihood-based statistics in practice. Focusing on the conditional randomization test (CRT), we find that its conditional mode of inference allows us to reformulate it as testing a point null hypothesis involving the conditional distribution of X. The Neyman-Pearson lemma then implies that a likelihood-based statistic yields the most powerful CRT against a point alternative. We also obtain a related optimality result for MX knockoffs. Switching to an asymptotic framework with arbitrarily growing covariate dimension, we derive an expression for the limiting power of the CRT against local semiparametric alternatives in terms of the prediction error of the machine learning algorithm on which its test statistic is based. Finally, we exhibit a resampling-free test with uniform asymptotic Type-I error control under the assumption that only the first two moments of X given Z are known, a significant relaxation of the MX assumption.
翻译:对于测试答复Y和预测者X的有条件独立性(CI),最近推出的模型-X(MX)框架一直是积极的方法研究的主题,特别是在MX的淘汰及其成功地应用于整个基因组的联系研究方面。在本文中,我们研究了MXCI测试的力量,从数量上深入了解机器学习的作用,并提供证据,以便在实践中使用基于可能性的统计数据。我们以有条件随机化测试(CRT)为重点,发现其有条件的推断模式允许我们将其重新定位为测试涉及有条件分布X的完全假设。Neyman-Pearson Lemma随后意味着基于可能性的统计数据将产生最强大的CRT相对于某个点的替代数据。我们还获得了与MX相关的最佳结果,转而转向一个具有任意增长的基于易变异性维度的无损性框架。我们发现,CRT相对于当地半参数替代方法的力量有限,在预测机器学习算法的错误方面,而其测试测算结果的首个标准是“无偏差性”的假设。最后,我们展示了“无误度”的X级测试,我们所知道的“无偏差度测试”是“无偏差”的测试。