Estimating how a treatment affects different individuals, known as heterogeneous treatment effect estimation, is an important problem in empirical sciences. In the last few years, there has been a considerable interest in adapting machine learning algorithms to the problem of estimating heterogeneous effects from observational and experimental data. However, these algorithms often make strong assumptions about the observed features in the data and ignore the structure of the underlying causal model, which can lead to biased estimation. At the same time, the underlying causal mechanism is rarely known in real-world datasets, making it hard to take it into consideration. In this work, we provide a survey of state-of-the-art data-driven methods for heterogeneous treatment effect estimation using machine learning, broadly categorizing them as methods that focus on counterfactual prediction and methods that directly estimate the causal effect. We also provide an overview of a third category of methods which rely on structural causal models and learn the model structure from data. Our empirical evaluation under various underlying structural model mechanisms shows the advantages and deficiencies of existing estimators and of the metrics for measuring their performance.
翻译:估计治疗如何影响不同的个人,称为不同治疗效果估计,是经验科学中的一个重要问题。在过去几年里,人们相当关心使机器学习算法适应估计观察和实验数据产生的不同影响的问题。然而,这些算法往往对数据中观察到的特征作出有力的假设,忽视基本因果模型的结构,从而可能导致偏差估计。与此同时,内在因果机制在现实世界的数据集中鲜为人知,因此难以加以考虑。在这项工作中,我们利用机器学习,对由数据驱动的不同治疗效果估计的最新方法进行了调查,将其广泛分类为侧重于反事实预测的方法和直接估计因果关系影响的方法。我们还概述了第三类方法,这些方法依赖结构性因果模型,并从数据中学习模型结构结构结构结构结构。我们在各种基本结构模型机制下进行的经验评估表明现有估计者及其业绩衡量指标的优缺点。