The model-X conditional randomization test (CRT) is a flexible and powerful testing procedure for the conditional independence hypothesis: X is independent of Y conditioning on Z. Though having many attractive properties, the model-X CRT relies on the model-X assumption that we have perfect knowledge of the distribution of X | Z. If there is an error in modeling the distribution of X | Z, this approach may lose its validity. This problem is even more severe when the adjustment covariates Z are of high dimensionality, in which situation precise modeling of X against Z can be hard. In response to this, we propose the Maxway (Model and Adjust X With the Assistance of Y) CRT, which learns the distribution of Y | Z, and uses it to calibrate the resampling distribution of X to gain robustness to the error in modeling X. We prove that the type-I error inflation of the Maxway CRT can be controlled by the learning error for the low-dimensional adjusting model plus the product of learning errors for X | Z and Y | Z, which could be interpreted as an "almost doubly robust" property. Based on this, we develop implementing algorithms of the Maxway CRT in practical scenarios including (surrogate-assisted) semi-supervised learning and transfer learning where valid information about Y | Z can be potentially provided by some auxiliary or external data. Through extensive simulation studies under different scenarios, we demonstrate that the Maxway CRT achieves significantly better type-I error control than existing model-X inference approaches while preserving similar powers. Finally, we apply our methodology to two real examples, including (1) studying obesity paradox with electronic health record (EHR) data assisted by surrogate variables; (2) inferring the side effect of statins among the ethnic minority group via transferring knowledge from the majority group.
翻译:暂无翻译