IPW扩增模拟器的新中央限制理论:差异通货膨胀、跨基差及以后 (A New Central Limit Theorem for the Augmented IPW Estimator: Variance Inflation, Cross-Fit Covariance and Beyond)

from arxiv, 132 pages, 7 figures; In V2, we added extensive comparisons with the classical variance formula (c.f.~Sec 3, Fig 2, Fig 4) and elaborated on the non-trivial cross-fit covariance phenomenon further

Estimation of the average treatment effect (ATE) is a central problem in causal inference. In recent times, inference for the ATE in the presence of high-dimensional covariates has been extensively studied. Among the diverse approaches that have been proposed, augmented inverse probability weighting (AIPW) with cross-fitting has emerged a popular choice in practice. In this work, we study this cross-fit AIPW estimator under well-specified outcome regression and propensity score models in a high-dimensional regime where the number of features and samples are both large and comparable. Under assumptions on the covariate distribution, we establish a new central limit theorem for the suitably scaled cross-fit AIPW that applies without any sparsity assumptions on the underlying high-dimensional parameters. Our CLT uncovers two crucial phenomena among others: (i) the AIPW exhibits a substantial variance inflation that can be precisely quantified in terms of the signal-to-noise ratio and other problem parameters, (ii) the asymptotic covariance between the pre-cross-fit estimators is non-negligible even on the root-n scale. These findings are strikingly different from their classical counterparts. On the technical front, our work utilizes a novel interplay between three distinct tools--approximate message passing theory, the theory of deterministic equivalents, and the leave-one-out approach. We believe our proof techniques should be useful for analyzing other two-stage estimators in this high-dimensional regime. Finally, we complement our theoretical results with simulations that demonstrate both the finite sample efficacy of our CLT and its robustness to our assumptions.

翻译：估计平均治疗效果( ATE) 是因果推断的一个中心问题。近些年来, 广泛研究了在高维共变异的情况下对ATE的推论。在提出的各种办法中, 增加了反概率加权( AIPW), 并进行了交叉校准, 在实践中产生了一种流行的选择。在这项工作中, 我们研究了AIPW的测量器, 在一个高层次制度中, 特征和样本的数量既大又可比较的高度制度下, 得出了跨维分布假设的ATE的推论。在对高维参数进行适当规模的交叉适用 AIPW的假设中, 我们提出了一个新的核心理论值限制。我们的CLT发现了两个关键现象:(一) AIPW 呈现了巨大的差异性通货膨胀, 可以用信号到噪音比率和其他问题参数来精确量化, (二) 预基点分布分布分布分布分布的假设值之间, 新的理论值为一个新的理论值值值值值值值, 我们的正确性估算值中, 我们的正确性估算结果, 我们的理论中的两个基础, 最终, 我们的理论值是。