Understanding and quantifying cause and effect is an important problem in many domains. The generally-agreed solution to this problem is to perform a randomised controlled trial. However, even when randomised controlled trials can be performed, they usually have relatively short duration's due to cost considerations. This makes learning long-term causal effects a very challenging task in practice, since the long-term outcome is only observed after a long delay. In this paper, we study the identification and estimation of long-term treatment effects when both experimental and observational data are available. Previous work provided an estimation strategy to determine long-term causal effects from such data regimes. However, this strategy only works if one assumes there are no unobserved confounders in the observational data. In this paper, we specifically address the challenging case where unmeasured confounders are present in the observational data. Our long-term causal effect estimator is obtained by combining regression residuals with short-term experimental outcomes in a specific manner to create an instrumental variable, which is then used to quantify the long-term causal effect through instrumental variable regression. We prove this estimator is unbiased, and analytically study its variance. In the context of the front-door causal structure, this provides a new causal estimator, which may be of independent interest. Finally, we empirically test our approach on synthetic-data, as well as real-data from the International Stroke Trial.
翻译:理解和量化原因及影响是许多领域的一个重要问题。 这一问题的公认解决办法是随机控制试验。 但是,即使随机控制试验可以进行,由于成本因素,它们通常也具有相对较短的时间期限。 这使得学习长期因果关系在实践中是一项非常艰巨的任务,因为长期结果只有在长期拖延后才观察到。 在本文件中,我们研究当实验和观察数据都存在时,确定和估计长期治疗效果的问题。 以往的工作提供了估计战略,以确定这类数据制度的长期因果关系。 但是,这一战略只有在假设观察数据中没有不可避免的共鸣者时才能奏效。 在本文中,我们专门处理一个具有挑战性的案件,即观察数据中存在非计量的共鸣者。我们的长期因果关系估计是通过将回归残余与短期实验结果相结合,以具体的方式创建一种工具变量,然后用来量化这类数据制度的长期因果关系。 我们证明这一估计数据是真实的,分析性的,我们作为先期的统计结果,最终的检验结果,我们作为结果的检验结果,我们作为最后的先期统计结果,我们作为结果分析结果,可以用来量化。