We study the identification and estimation of long-term treatment effects when both experimental and observational data are available. Since the long-term outcome is observed only after a long delay, it is not measured in the experimental data, but only recorded in the observational data. However, both types of data include observations of some short-term outcomes. In this paper, we uniquely tackle the challenge of persistent unmeasured confounders, i.e., some unmeasured confounders that can simultaneously affect the treatment, short-term outcomes and the long-term outcome, noting that they invalidate identification strategies in previous literature. To address this challenge, we exploit the sequential structure of multiple short-term outcomes, and develop three novel identification strategies for the average long-term treatment effect. We further propose three corresponding estimators and prove their asymptotic consistency and asymptotic normality. We finally apply our methods to estimate the effect of a job training program on long-term employment using semi-synthetic data. We numerically show that our proposals outperform existing methods that fail to handle persistent confounders.
翻译:在有实验和观察数据的情况下,我们研究长期治疗效果的确定和估计。由于长期结果只有在长期拖延后才观察到,因此没有在实验数据中进行测量,而只是记录在观测数据中。但是,两种数据都包括一些短期结果的观察。在本文件中,我们独善其身地应对长期不测的混杂者的挑战,即一些可能同时影响治疗、短期结果和长期结果的未测混杂者的挑战,指出它们使以前的文献中的识别战略无效。为了应对这一挑战,我们利用多个短期结果的相继结构,并为平均长期治疗效果制定三种新的识别战略。我们进一步提议了三个相应的估算者,并证明其无症状的一致性和无症状的正常性。我们最后运用我们的方法,使用半合成数据来估计职业培训方案对长期就业的影响。我们用数字方法表明,我们的建议优于现有方法,无法处理持久性汇合者。