Heterogeneous effect estimation plays a crucial role in causal inference, with applications across medicine and social science. Many methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but there are important theoretical gaps in understanding if and when such methods are optimal. This is especially true when the CATE has nontrivial structure (e.g., smoothness or sparsity). Our work contributes in several main ways. First, we study a two-stage doubly robust CATE estimator and give a generic model-free error bound, which, despite its generality, yields sharper results than those in the current literature. We apply the bound to derive error rates in nonparametric models with smoothness or sparsity, and give sufficient conditions for oracle efficiency. Underlying our error bound is a general oracle inequality for regression with estimated or imputed outcomes, which is of independent interest; this is the second main contribution. The third contribution is aimed at understanding the fundamental statistical limits of CATE estimation. To that end, we propose and study a local polynomial adaptation of double-residual regression. We show that this estimator can be oracle efficient under even weaker conditions, if used with a specialized form of sample splitting and careful choices of tuning parameters. These are the weakest conditions currently found in the literature, and we conjecture that they are minimal in a minimax sense. We go on to give error bounds in the non-trivial regime where oracle rates cannot be achieved. Some finite-sample properties are explored with simulations.
翻译:遗传效应估计在因果推断中起着关键作用,它贯穿医学和社会科学的应用。近年来提出了许多估算有条件平均治疗效果的方法(CATEs),但是在理解这些方法是否最佳以及何时最理想方面存在着重要的理论差距。当CATE结构非三角(如平滑或偏狭)时,尤其如此。我们的工作以几种主要方式作出贡献。首先,我们研究一个两阶段双级强的CATE估计器,并给出一个通用的无模型错误,尽管其普遍性比当前文献中的数据产生更显著的结果。我们运用这一约束,在非对称模型中得出出错率时,在这种方法最优、最优、最优的模型中,我们发现和研究一个不精确、最优的模型。我们发现,在这种精确和最优的模型中,我们无法在这种精确的模型中找到一个精确的精确的精确度。我们发现,在这种精确和精确的精确的模型中,我们无法在这种精确的精确的模型中找到。