Model-X approaches to testing conditional independence between a predictor and an outcome variable given a vector of covariates usually assume exact knowledge of the conditional distribution of the predictor given the covariates. Nevertheless, model-X methodologies are often deployed with this conditional distribution learned in sample. We investigate the consequences of this choice through the lens of the distilled conditional randomization test (dCRT). We find that Type-I error control is still possible, but only if the mean of the outcome variable given the covariates is estimated well enough. This demonstrates that the dCRT is doubly robust, and motivates a comparison to the generalized covariance measure (GCM) test, another doubly robust conditional independence test. We prove that these two tests are asymptotically equivalent, and show that the GCM test is optimal against (generalized) partially linear alternatives by leveraging semiparametric efficiency theory. In an extensive simulation study, we compare the dCRT to the GCM test. These two tests have broadly similar Type-I error and power, though dCRT can have somewhat better Type-I error control but somewhat worse power in small samples or when the response is discrete. We also find that post-lasso based test statistics (as compared to lasso based statistics) can dramatically improve Type-I error control for both methods.
翻译:用于测试预测器和结果变量之间的有条件独立的模型- X 方法; 测试预测器和结果变量之间的有条件独立性的模型- X 方法; 共差矢量的矢量通常假定对预测器的有条件分布有确切的了解; 然而, 模型- X 方法往往与在抽样中学习的这种有条件分布方法一起部署; 我们通过蒸馏的有条件随机测试(dCRT)的透镜调查这一选择的后果。 我们发现, 类型I 误差控制仍然是可能的, 但只有在对结果变量的平均值( 共差量值) 进行足够充分估计的情况下, 才能进行该模式- 差错控制。 这两次测试表明, dCRT 具有双重性强力, 并且与通用的常变异性常量测量( 以离心率为基础) 相比, 我们证明这两种测试是相同的, 在以离心量效率理论为依据的样本中, 或以离心型统计为基时, 我们也可以找到比差力稍差的。