Causal inference is the process of using assumptions, study designs, and estimation strategies to draw conclusions about the causal relationships between variables based on data. This allows researchers to better understand the underlying mechanisms at work in complex systems and make more informed decisions. In many settings, we may not fully observe all the confounders that affect both the treatment and outcome variables, complicating the estimation of causal effects. To address this problem, a growing literature in both causal inference and machine learning proposes to use Instrumental Variables (IV). This paper serves as the first effort to systematically and comprehensively introduce and discuss the IV methods and their applications in both causal inference and machine learning. First, we provide the formal definition of IVs and discuss the identification problem of IV regression methods under different assumptions. Second, we categorize the existing work on IV methods into three streams according to the focus on the proposed methods, including two-stage least squares with IVs, control function with IVs, and evaluation of IVs. For each stream, we present both the classical causal inference methods, and recent developments in the machine learning literature. Then, we introduce a variety of applications of IV methods in real-world scenarios and provide a summary of the available datasets and algorithms. Finally, we summarize the literature, discuss the open problems and suggest promising future research directions for IV methods and their applications. We also develop a toolkit of IVs methods reviewed in this survey at https://github.com/causal-machine-learning-lab/mliv.
翻译:原因推论是使用假设、研究设计和估算战略的过程,以对基于数据的变量之间的因果关系作出结论,使研究人员能够更好地了解复杂系统中工作的基本机制,并作出更知情的决定。在许多情况下,我们可能不完全观察影响治疗和结果变量、使因果关系估计复杂化的所有混乱者。为解决这一问题,因果推论和机器学习方面的越来越多的文献提议使用工具变量(四)。本文是系统、全面地介绍和讨论四种方法及其在因果关系推断和机器学习方面的应用的首个努力。首先,我们提供四种方法的正式定义,并讨论不同假设下四种回归方法的识别问题。第二,我们根据对拟议方法的侧重,将四种方法的现有工作分为三流,包括两阶段最少的四种,四种控制功能,四种评估。 对于每一流,我们介绍典型的推断方法,以及机器学习文献的最新发展。然后,我们介绍四类方法的多种应用情况,在现实世界中,我们分析了四类研究的第四种方法,在最后的图表中,我们分析了四种方法,在现实中,我们分析了四类研究中,然后分析了可使用的方法。