In many empirical settings, directly observing a treatment variable may be infeasible although an error-prone surrogate measurement of the latter will often be available. Causal inference based solely on the surrogate measurement is particularly challenging without validation data. We propose a method that obviates the need for validation data by carefully incorporating the surrogate measurement with a proxy of the hidden treatment to obtain nonparametric identification of several causal effects of interest, including the population average treatment effect, the effect of treatment on the treated, quantile treatment effects, and causal effects under marginal structural models. For inference, we provide general semiparametric theory for causal effects identified using our approach and derive a large class of semiparametric efficient estimators with an appealing multiple robustness property. A significant obstacle to our approach is the estimation of nuisance functions which involve the hidden treatment therefore preventing the direct use of standard machine learning algorithms, which we resolve by introducing a novel semiparametric EM algorithm. We examine the finite-sample performance of our method using simulations and an application which aims to estimate the causal effect of Alzheimer's disease on hippocampal volume using data from the Alzheimer's Disease Neuroimaging Initiative.
翻译:暂无翻译