Anecdotally, using an estimated propensity score is superior to the true propensity score in estimating the average treatment effect based on observational data. However, this claim comes with several qualifications: it holds only if propensity score model is correctly specified and the number of covariates $d$ is small relative to the sample size $n$. We revisit this phenomenon by studying the inverse propensity score weighting (IPW) estimator based on a logistic model with a diverging number of covariates. We first show that the IPW estimator based on the estimated propensity score is consistent and asymptotically normal with smaller variance than the oracle IPW estimator (using the true propensity score) if and only if $n \gtrsim d^2$. We then propose a debiased IPW estimator that achieves the same guarantees in the regime $n \gtrsim d^{3/2}$. Our proofs rely on a novel non-asymptotic decomposition of the IPW error along with careful control of the higher order terms.
翻译:估计倾向得分更好的条件是什么?高维分析和偏差校正
传统上,使用估计倾向得分相比真实倾向得分更有利于基于观测数据估计平均处理效果。然而,这种说法有几个限制条件:只有在倾向得分模型正确规范化且协变量数量$d$相对于样本大小$n$很小时才成立。本文通过研究基于逻辑模型的倒数倾向得分加权(IPW)估计器, 在具有发散协变量数量的情况下对此现象进行了重新评估。我们首先证明,仅当$n \gtrsim d^2$时, 基于估计倾向得分的IPW估计器是一致和渐进正常的,其方差比真实的IPW估计器(使用真实的倾向得分)小。然后,我们提出了一种去偏IPW估计器,它在$n \gtrsim d^{3/2}$的区间内实现了相同的保证。我们的证明依赖于一种新的非渐近分解IPW误差的方法,以及对更高阶项的仔细控制。