Model-X approaches to testing conditional independence between a predictor and an outcome variable given a vector of covariates usually assume exact knowledge of the conditional distribution of the predictor given the covariates. Nevertheless, model-X methodologies are often deployed with this conditional distribution learned in sample. We investigate the consequences of this choice through the lens of the distilled conditional randomization test (dCRT). We find that Type-I error control is still possible, but only if the mean of the outcome variable given the covariates is estimated well enough. This demonstrates that the dCRT is doubly robust, and motivates a comparison to the generalized covariance measure (GCM) test, another doubly robust conditional independence test. We prove that these two tests are asymptotically equivalent, and show that the GCM test is in fact optimal against (generalized) partially linear alternatives by leveraging semiparametric efficiency theory. In an extensive simulation study, we compare the dCRT to the GCM test. We find that the GCM test and the dCRT are quite similar in terms of both Type-I error and power, and that post-lasso based test statistics (as compared to lasso based statistics) can dramatically improve Type-I error control for both methods.
翻译:测试预测器和结果变量之间有条件独立的模型- X 方法, 测试预测器和结果变量之间的有条件独立, 共差矢量的矢量通常假定对预测器的有条件分布有确切的了解, 然而, 模型- X 方法往往在抽样中学习到的有条件分布方法中采用。 我们通过蒸馏的有条件随机测试(dCRT)的透镜来调查这一选择的后果。 我们发现, 类型I 误差控制仍然是可能的, 但只有在对结果变量的平均值( 共差值)进行足够充分估计的情况下, 才能进行类型I 误差和功率测试。 这证明, dCRT 测试和 dCRT 与通用常变异性测量( GCM) 测试( GCM) 相当相似, 这是另一个双倍稳健的有条件独立测试。 我们证明, 这两种测试都与这些测试相同, 并表明, 利用半参数理论, GCM 测试实际上对( 一般) 部分线性替代物进行最佳的测试。 在广泛的模拟研究中, 我们把 dCRT 与 RT 和 GCM 测试 测试 和 dCRT 都非常相似, 我们发现, 和 dCRT 和 dCRT 在类型I 和 测试 两种方法上都可大大改进了 。