We consider the problem of selecting confounders for adjustment from a potentially large set of covariates, when estimating a causal effect. Recently, the high-dimensional Propensity Score (hdPS) method was developed for this task; hdPS ranks potential confounders by estimating an importance score for each variable and selects the top few variables. However, this ranking procedure is limited: it requires all variables to be binary. We propose an extension of the hdPS to general types of response and confounder variables. We further develop a group importance score, allowing us to rank groups of potential confounders. The main challenge is that our parameter requires either the propensity score or response model; both vulnerable to model misspecification. We propose a targeted maximum likelihood estimator (TMLE) which allows the use of nonparametric, machine learning tools for fitting these intermediate models. We establish asymptotic normality of our estimator, which consequently allows constructing confidence intervals. We complement our work with numerical studies on simulated and real data.
翻译:在估计因果关系时,我们考虑从一系列潜在的大共变变量中选择调整问题。最近,为这项任务制定了高维分分分数(hdPS)方法;hdPS通过估计每个变量的重要性分数排列了潜在的共变因素,并选择了最少数变量。然而,这一排序程序是有限的:它要求所有变量都是二进制的。我们建议将hdPS扩大到一般类型的响应和相混变量。我们进一步开发了一个群份重要性分,允许我们分分分潜在共变因素。主要挑战是我们的参数需要偏差分分或响应模型;两者都容易被模型误差。我们建议了一种目标最大可能性的估测算器(TMLE),允许使用非参数的机械学习工具来安装这些中间模型。我们建立了我们的估测算器的无症状常态,从而可以构建信任间隔。我们用模拟和真实数据的数字研究来补充我们的工作。