Instrumental variable (IV) is a powerful approach to inferring the causal effect of a treatment on an outcome of interest from observational data even when there exist latent confounders between the treatment and the outcome. However, existing IV methods require that an IV is selected and justified with domain knowledge. An invalid IV may lead to biased estimates. Hence, discovering a valid IV is critical to the applications of IV methods. In this paper, we study and design a data-driven algorithm to discover valid IVs from data under mild assumptions. We develop the theory based on partial ancestral graphs (PAGs) to support the search for a set of candidate Ancestral IVs (AIVs), and for each possible AIV, the identification of its conditioning set. Based on the theory, we propose a data-driven algorithm to discover a pair of IVs from data. The experiments on synthetic and real-world datasets show that the developed IV discovery algorithm estimates accurate estimates of causal effects in comparison with the state-of-the-art IV based causal effect estimators.
翻译:仪器变量(IV)是一种强有力的方法,用以推断即使存在潜在混淆者,治疗结果与观察数据有关,对观察数据感兴趣的结果的因果关系产生何种因果关系。然而,现有的四类方法要求选择四类,并以域知识为根据。无效的四类方法可能导致偏差估计。因此,发现有效的四类方法对于四类方法的应用至关重要。在本文中,我们研究和设计一种数据驱动算法,从温和假设下的数据中发现有效的四类。我们根据部分祖传图(PAGs)制定理论,以支持寻找一套候选祖传图(AIVs),并为每一种可能的四类替代图(AIVs)确定调控装置。根据理论,我们提出一种数据驱动算法,从数据中发现一对四。关于合成和真实世界数据集的实验表明,开发的四类发现算法与基于状态的四类因果关系估计师相比,估计因果关系的准确估计数。