Estimation of the average treatment effect (ATE) is a central problem in causal inference. In recent times, inference for the ATE in the presence of high-dimensional covariates has been extensively studied. Among the diverse approaches that have been proposed, augmented inverse probability weighting (AIPW) with cross-fitting has emerged a popular choice in practice. In this work, we study this cross-fit AIPW estimator under well-specified outcome regression and propensity score models in a high-dimensional regime where the number of features and samples are both large and comparable. Under assumptions on the covariate distribution, we establish a new central limit theorem for the suitably scaled cross-fit AIPW that applies without any sparsity assumptions on the underlying high-dimensional parameters. Our CLT uncovers two crucial phenomena among others: (i) the AIPW exhibits a substantial variance inflation that can be precisely quantified in terms of the signal-to-noise ratio and other problem parameters, (ii) the asymptotic covariance between the pre-cross-fit estimators is non-negligible even on the root-n scale. These findings are strikingly different from their classical counterparts. On the technical front, our work utilizes a novel interplay between three distinct tools--approximate message passing theory, the theory of deterministic equivalents, and the leave-one-out approach. We believe our proof techniques should be useful for analyzing other two-stage estimators in this high-dimensional regime. Finally, we complement our theoretical results with simulations that demonstrate both the finite sample efficacy of our CLT and its robustness to our assumptions.
翻译:估计平均治疗效果( ATE) 是因果推断的一个中心问题。 近些年来, 广泛研究了在高维共变异的情况下对ATE的推论。 在提出的各种办法中, 增加了反概率加权( AIPW), 并进行了交叉校准, 在实践中产生了一种流行的选择。 在这项工作中, 我们研究了AIPW的测量器, 在一个高层次制度中, 特征和样本的数量既大又可比较的高度制度下, 得出了跨维分布假设的ATE的推论。 在对高维参数进行适当规模的交叉适用 AIPW的假设中, 我们提出了一个新的核心理论值限制。 我们的CLT发现了两个关键现象:(一) AIPW 呈现了巨大的差异性通货膨胀, 可以用信号到噪音比率和其他问题参数来精确量化, (二) 预基点分布分布分布分布分布的假设值之间, 新的理论值为一个新的理论值值值值值值值, 我们的正确性估算值中, 我们的正确性估算结果, 我们的理论中的两个基础, 最终, 我们的理论值是 。