In causal inference, matching is one of the most widely used methods to mimic a randomized experiment using observational (non-experimental) data. Ideally, treated units are exactly matched with control units for the covariates so that the treatments are as-if randomly assigned within each matched set, and valid randomization tests for treatment effects can then be conducted as in a randomized experiment. However, inexact matching typically exists, especially when there are continuous or many observed covariates or when unobserved covariates exist. Previous matched observational studies routinely conducted downstream randomization tests as if matching was exact, as long as the matched datasets satisfied some prespecified balance criteria or passed some balance tests. Some recent studies showed that this routine practice could render a highly inflated type-I error rate of randomization tests, especially when the sample size is large. To handle this problem, we propose an iterative convex programming framework for randomization tests with inexactly matched datasets. Under some commonly used regularity conditions, we show that our approach can produce valid randomization tests (i.e., robustly controlling the type-I error rate) for any inexactly matched datasets, even when unobserved covariates exist. Our framework allows the incorporation of flexible machine learning models to better extract information from covariate imbalance while robustly controlling the type-I error rate.
翻译:暂无翻译