半参数反事实密度估计 (Semiparametric counterfactual density estimation)

Causal effects are often characterized with averages, which can give an incomplete picture of the underlying counterfactual distributions. Here we consider estimating the entire counterfactual density and generic functionals thereof. We focus on two kinds of target parameters. The first is a density approximation, defined by a projection onto a finite-dimensional model using a generalized distance metric, which includes f-divergences as well as $L_p$ norms. The second is the distance between counterfactual densities, which can be used as a more nuanced effect measure than the mean difference, and as a tool for model selection. We study nonparametric efficiency bounds for these targets, giving results for smooth but otherwise generic models and distances. Importantly, we show how these bounds connect to means of particular non-trivial functions of counterfactuals, linking the problems of density and mean estimation. We go on to propose doubly robust-style estimators for the density approximations and distances, and study their rates of convergence, showing they can be optimally efficient in large nonparametric models. We also give analogous methods for model selection and aggregation, when many models may be available and of interest. Our results all hold for generic models and distances, but throughout we highlight what happens for particular choices, such as $L_2$ projections on linear models, and KL projections on exponential families. Finally we illustrate by estimating the density of CD4 count among patients with HIV, had all been treated with combination therapy versus zidovudine alone, as well as a density effect. Our results suggest combination therapy may have increased CD4 count most for high-risk patients. Our methods are implemented in the freely available R package npcausal on GitHub.

翻译：碱性效果通常以平均值为特征,它可以对反事实分布的根基反事实分布进行不完全的描述。我们在这里考虑估计整个反事实密度和一般功能。我们侧重于两种目标参数。首先是密度近似, 其定义是用一个通用的距离指标投射到一个有限模型上, 其中包括f- diverences 和 $L_p$ 标准。第二个是反事实性密度之间的距离, 它可以用来作为比平均差异更细微的复合效果衡量标准, 并且作为选择模型的工具。我们研究这些目标的非参数性效率界限, 给平滑但其它通用模型和距离带来结果。重要的一点是, 我们展示这些界限如何连接到一个特定的非三维的反事实模型, 将密度和平均估计问题联系起来。我们继续提出更强烈的稳健性估算, 并研究它们的趋同率, 表明我们在大型非偏差模型中可以最高效地处理这些结果。我们还在模型的模型选择和直径预测上给出了类似的方法, 当许多模型可以执行时, CD最后的模型和直线性结果时, 。