We consider estimation of average treatment effects given observational data with high-dimensional pretreatment variables. Existing methods for this problem typically assume some form of sparsity for the regression functions. In this work, we introduce a debiased inverse propensity score weighting (DIPW) scheme for average treatment effect estimation that delivers $\sqrt{n}$-consistent estimates when the propensity score follows a sparse logistic regression model; the outcome regression functions are permitted to be arbitrarily complex. We further demonstrate how confidence intervals centred on our estimates may be constructed. Our theoretical results quantify the price to pay for permitting the regression functions to be unestimable, which shows up as an inflation of the variance of the estimator compared to the semiparametric efficient variance by a constant factor, under mild conditions. We also show that when outcome regressions can be estimated faster than a slow $1/\sqrt{ \log n}$ rate, our estimator achieves semiparametric efficiency. As our results accommodate arbitrary outcome regression functions, averages of transformed responses under each treatment may also be estimated at the $\sqrt{n}$ rate. Thus, for example, the variances of the potential outcomes may be estimated. We discuss extensions to estimating linear projections of the heterogeneous treatment effect function and explain how propensity score models with more general link functions may be handled within our framework. An R package \texttt{dipw} implementing our methodology is available on CRAN.
翻译:估计高维混淆因素下的平均处理效应的去偏倚倒数概率加权
翻译后的摘要:
我们考虑在具有高维处理前变量的观察数据中估计平均处理效应。现有的解决此问题的方法通常假定回归函数具有某种稀疏性。在这项工作中,我们引入了一个去偏倚倒数概率加权(DIPW)方案来估计平均处理效应,当倾向得分遵循稀疏逻辑回归模型时,该方案提供一个$\sqrt{n}$-一致估计;结果回归函数可以是任意复杂的。我们进一步说明了如何构建以我们的估计为中心的置信区间。我们的理论结果量化了允许回归函数是不可估计的代价,这表现为估计量的方差相对于半参数有效方差的膨胀因素,这是在温和条件下的。我们还展示了当结果回归可以以比慢$1/\sqrt{\log n}$的速率更快地估计时,我们的估计器实现了半参数效率。由于我们的结果适用于任意结果回归函数,因此还可以以$\sqrt{n}$的速率估计每种治疗下转化响应的平均数。因此,例如,可以估计潜在结果的方差。我们讨论了估计异质性处理效应函数的线性投影的扩展,并解释了如何在我们的框架内处理具有更一般链接函数的倾向得分模型。一个实现我们方法的R包dipw可以在CRAN上获得。