Methods utilizing instrumental variables have been a fundamental statistical approach to estimation in the presence of unmeasured confounding, usually occurring in non-randomized observational data common to fields such as economics and public health. However, such methods usually make constricting linearity and additivity assumptions that are inapplicable to the complex modeling challenges of today. The growing body of observational data being collected will necessitate flexible regression modeling while also being able to control for confounding using instrumental variables. Therefore, this article presents a nonlinear instrumental variable regression model based on Bayesian regression tree ensembles to estimate such relationships, including interactions, in the presence of confounding. One exciting application of this method is to use genetic variants as instruments, known as Mendelian randomization. Body mass index is one factor that is hypothesized to have a nonlinear relationship with cardiovascular risk factors such as blood pressure while interacting with age. Heterogeneity in patient characteristics such as age could be clinically interesting from a precision medicine perspective where individualized treatment is emphasized. We present our flexible Bayesian instrumental variable regression tree method with an example from the UK Biobank where body mass index is related to blood pressure using genetic variants as the instruments.
翻译:使用工具变量的方法一直是一种基本的统计方法,在经济与公共卫生等领域常见的非随机观测数据中,通常发生在经济与公共卫生等领域的非随机观测数据中。然而,这种方法通常使限制性的线性假设和相加性假设与当今复杂的模型挑战不适用。正在收集的观测数据体积越来越大,这就需要灵活的回归模型,同时能够控制使用工具变量的混乱。因此,本篇文章提出了一个非线性工具变量回归模型,以巴伊西亚回归树集合为基础,用以估计这种关系,包括混杂的相互作用。这种方法的一个令人振奋的应用是使用基因变异物作为工具,称为门德罗随机化。身体质量指数是一个假设性因素,与心血管风险因素(如血液压力)有非线性关系,同时能够控制使用工具变异性变量。从强调个体化治疗的精确医学角度看,像年龄等病人特征在临床上可能很有意思。我们采用灵活Bayesian工具变量回归树的方法,作为工具,使用英国遗传数据库的模型模型作为样本。