Evaluating treatment effect heterogeneity widely informs treatment decision making. At the moment, much emphasis is placed on the estimation of the conditional average treatment effect via flexible machine learning algorithms. While these methods enjoy some theoretical appeal in terms of consistency and convergence rates, they generally perform poorly in terms of uncertainty quantification. This is troubling since assessing risk is crucial for reliable decision-making in sensitive and uncertain environments. In this work, we propose a conformal inference-based approach that can produce reliable interval estimates for counterfactuals and individual treatment effects under the potential outcome framework. For completely randomized or stratified randomized experiments with perfect compliance, the intervals have guaranteed average coverage in finite samples regardless of the unknown data generating mechanism. For randomized experiments with ignorable compliance and general observational studies obeying the strong ignorability assumption, the intervals satisfy a doubly robust property which states the following: the average coverage is approximately controlled if either the propensity score or the conditional quantiles of potential outcomes can be estimated accurately. Numerical studies on both synthetic and real datasets empirically demonstrate that existing methods suffer from a significant coverage deficit even in simple models. In contrast, our methods achieve the desired coverage with reasonably short intervals.
翻译:评估治疗效应的异质性广泛影响治疗决策。目前,许多强调的重点是通过灵活的机器学习算法来估计有条件的平均治疗效果。这些方法在一致性和趋同率方面有一些理论吸引力,但在不确定性的量化方面一般表现不佳。这令人不安,因为评估风险对于敏感和不确定环境中可靠的决策至关重要。在这项工作中,我们建议采用基于一致推论的方法,在潜在结果框架之下,为反事实和个人治疗影响得出可靠的间隔估计数。对于完全随机化或分层随机化的实验,无论是否有未知的数据生成机制,间隔都保证了有限样本的平均覆盖率。对于随机实验,如果不遵守和一般观察研究符合强度的忽略假设,间隔期将满足一个双重强势的属性,说明如下:如果能够准确估计偏差分或潜在结果的有条件的四分数,平均覆盖面大致可以控制。关于合成和真实数据集的数值研究经验性地表明,现有方法即使以简单模型的形式也存在重大覆盖面缺陷。相比之下,我们的方法在合理的间隔下实现了预期的间隔。