In many empirical settings, directly observing a treatment variable may be infeasible although an error-prone surrogate measurement of the latter will often be available. Causal inference based solely on the observed surrogate measurement of the hidden treatment may be particularly challenging without an additional assumption or auxiliary data. To address this issue, we propose a method that carefully incorporates the surrogate measurement together with a proxy of the hidden treatment to identify its causal effect on any scale for which identification would in principle be feasible had contrary to fact the treatment been observed error-free. Beyond identification, we provide general semiparametric theory for causal effects identified using our approach, and we derive a large class of semiparametric estimators with an appealing multiple robustness property. A significant obstacle to our approach is the estimation of nuisance functions involving the hidden treatment, which prevents the direct application of standard machine learning algorithms. To resolve this, we introduce a novel semiparametric EM algorithm, thus adding a practical dimension to our theoretical contributions. This methodology can be adapted to analyze a large class of causal parameters in the proposed hidden treatment model, including the population average treatment effect, the effect of treatment on the treated, quantile treatment effects, and causal effects under marginal structural models. We examine the finite-sample performance of our method using simulations and an application which aims to estimate the causal effect of Alzheimer's disease on hippocampal volume using data from the Alzheimer's Disease Neuroimaging Initiative.
翻译:暂无翻译