Causal inference from observational data requires untestable assumptions. If these assumptions apply, machine learning (ML) methods can be used to study complex forms of causal-effect heterogeneity. Several ML methods were developed recently to estimate the conditional average treatment effect (CATE). If the features at hand cannot explain all heterogeneity, the individual treatment effects (ITEs) can seriously deviate from the CATE. In this work, we demonstrate how the distributions of the ITE and the estimated CATE can differ when a causal random forest (CRF) is applied. We extend the CRF to estimate the difference in conditional variance between treated and controls. If the ITE distribution equals the CATE distribution, this difference in variance should be small. If they differ, an additional causal assumption is necessary to quantify the heterogeneity not captured by the CATE distribution. The conditional variance of the ITE can be identified when the individual effect is independent of the outcome under no treatment given the measured features. Then, in the cases where the ITE and CATE distributions differ, the extended CRF can appropriately estimate the characteristics of the ITE distribution while the CRF fails to do so.
翻译:如果这些假设适用,机器学习(ML)方法可用于研究因果关系差异的复杂形式。最近开发了几种 ML 方法来估计有条件平均治疗效果(CATE)。如果手头的特征无法解释所有异质性,个别治疗效果(ITE)可能严重偏离CATE。在这项工作中,我们证明在应用因果随机森林(CRF)时,ITE和估计CATE的分布会如何不同。我们扩大通用报告格式来估计受处理和控制之间的有条件差异。如果ITE分布等于CATE分布,这种差异应该较小。如果这些差异不同,则有必要增加一个因果假设,以量化CATE分布所没有记录的异性。如果根据测量特征,单个影响与未处理的结果无关,则可以查明ITE的有条件差异。然后,在ITE和CATE分布不同的情况下,扩展的通用报告格式可以适当估计ITE分布的特点,而通用报告格式则无法这样做。