We study identifying and estimating the causal effect of a treatment variable on a long-term outcome using data from an observational and an experimental domain. The observational data are subject to unobserved confounding. Furthermore, subjects in the experiment are only followed for a short period; thus, long-term effects are unobserved, though short-term effects are available. Consequently, neither data source alone suffices for causal inference on the long-term outcome, necessitating a principled fusion of the two. We propose three approaches for data fusion for the purpose of identifying and estimating the causal effect. The first assumes equal confounding bias for short-term and long-term outcomes. The second weakens this assumption by leveraging an observed confounder for which the short-term and long-term potential outcomes share the same partial additive association with this confounder. The third approach employs proxy variables of the latent confounder of the treatment-outcome relationship, extending the proximal causal inference framework to the data fusion setting. For each approach, we develop influence function-based estimators and analyze their robustness properties. We illustrate our methods by estimating the effect of class size on 8th-grade SAT scores using data from the Project STAR experiment combined with observational data from the Early Childhood Longitudinal Study.
翻译:暂无翻译