Instrumental variable methods provide useful tools for inferring causal effects in the presence of unmeasured confounding. To apply these methods with large-scale data sets, a major challenge is to find valid instruments from a possibly large candidate set. In practice, most of the candidate instruments are often not relevant for studying a particular exposure of interest. Moreover, not all relevant candidate instruments are valid as they may directly influence the outcome of interest. In this article, we propose a data-driven method for causal inference with many candidate instruments that addresses these two challenges simultaneously. A key component of our proposal is a novel resampling method, which constructs pseudo variables to remove irrelevant candidate instruments having spurious correlations with the exposure. Synthetic data analyses show that the proposed method performs favourably compared to existing methods. We apply our method to a Mendelian randomization study estimating the effect of obesity on health-related quality of life.
翻译:工具变量方法提供了有用的工具,用以在出现未测的混乱时推断因果关系。将这些方法与大型数据集应用起来,主要的挑战是如何从可能的大型候选数据集中找到有效的工具。实际上,大多数候选工具往往与研究某种特定感兴趣的接触不相关。此外,并非所有相关候选工具都有效,因为它们可能直接影响到利益结果。在本条中,我们提出了一个数据驱动方法,用于对许多同时应对这两个挑战的候选工具进行因果关系推断。我们提案的一个关键组成部分是新颖的再抽样方法,该方法构建假变量,以删除与接触有虚假关联的无关的候选工具。合成数据分析表明,拟议方法与现有方法相比表现良好。我们用我们的方法对估算肥胖对健康相关生活质量的影响的门德罗式随机化研究进行了估计。