Estimating how a treatment affects units individually, known as heterogeneous treatment effect (HTE) estimation, is an essential part of decision-making and policy implementation. The accumulation of large amounts of data in many domains, such as healthcare and e-commerce, has led to increased interest in developing data-driven algorithms for estimating heterogeneous effects from observational and experimental data. However, these methods often make strong assumptions about the observed features and ignore the underlying causal model structure, which can lead to biased HTE estimation. At the same time, accounting for the causal structure of real-world data is rarely trivial since the causal mechanisms that gave rise to the data are typically unknown. To address this problem, we develop a feature selection method that considers each feature's value for HTE estimation and learns the relevant parts of the causal structure from data. We provide strong empirical evidence that our method improves existing data-driven HTE estimation methods under arbitrary underlying causal structures. Our results on synthetic, semi-synthetic, and real-world datasets show that our feature selection algorithm leads to lower HTE estimation error.
翻译:估计治疗如何影响个别单位,称为不同治疗效果(HTE)估计,是决策和政策执行的一个基本部分。在许多领域,例如保健和电子商务领域,大量数据的积累导致人们更加关注开发数据驱动算法,以估计观察和实验数据产生的不同影响。然而,这些方法往往对观察到的特征作出强有力的假设,忽视潜在的因果模型结构,这可能导致偏颇的HTE估计。与此同时,计算实际世界数据的因果结构很少是微不足道的,因为产生数据的因果机制通常是未知的。为了解决这一问题,我们开发了一种特征选择方法,其中考虑到每个特征对HTE估计的价值,并从数据中了解因果关系结构的相关部分。我们提供了有力的经验证据,证明我们的方法在任意的基本因果结构下改进了现有的以数据驱动的HTE估计方法。我们在合成、半合成和真实世界数据集方面的结果表明,我们的特征选择算法导致较低的HTE估计错误。