是什么使基于森林的异种治疗效果估计器起作用? (What Makes Forest-Based Heterogeneous Treatment Effect Estimators Work?)

Estimation of heterogeneous treatment effects (HTE) is of prime importance in many disciplines, ranging from personalized medicine to economics among many others. Random forests have been shown to be a flexible and powerful approach to HTE estimation in both randomized trials and observational studies. In particular "causal forests", introduced by Athey, Tibshirani and Wager (2019), along with the R implementation in package grf were rapidly adopted. A related approach, called "model-based forests", that is geared towards randomized trials and simultaneously captures effects of both prognostic and predictive variables, was introduced by Seibold, Zeileis and Hothorn (2018) along with a modular implementation in the R package model4you. Here, we present a unifying view that goes beyond the theoretical motivations and investigates which computational elements make causal forests so successful and how these can be blended with the strengths of model-based forests. To do so, we show that both methods can be understood in terms of the same parameters and model assumptions for an additive model under L2 loss. This theoretical insight allows us to implement several flavors of "model-based causal forests" and dissect their different elements in silico. The original causal forests and model-based forests are compared with the new blended versions in a benchmark study exploring both randomized trials and observational settings. In the randomized setting, both approaches performed akin. If confounding was present in the data generating process, we found local centering of the treatment indicator with the corresponding propensities to be the main driver for good performance. Local centering of the outcome was less important, and might be replaced or enhanced by simultaneous split selection with respect to both prognostic and predictive effects.

翻译：在许多学科中,从个性化医学到经济学等许多学科中,对不同处理效果(HTE)的估算都至关重要。随机森林在随机试验和观察研究中都显示是一种灵活和有力的方法,在随机试验和观察研究中,对HTE进行估算。特别是AYASY、Tibshirani和Wager(2019年)提出的“因果森林”,同时采用R的包状面壁执行。一个叫作“基于模型的森林”的相关方法,它面向随机试验,同时捕捉预测性和预测变量的效果。Seibold、Zeileis和Hothorn(2018年)在R包模型4you的模块执行中,都显示了对HTE进行灵活和有力的方法。在这里,我们提出了一个统一的观点,它超越了理论动机,调查这些计算要素如何使因果森林如此成功,以及如何与基于模型的森林的优势相结合。为了做到这一点,我们可以用基于同一中心参数的参数和模型的模型假设来理解L2损失下的一个补充模型。这个理论洞察让我们在原始的观察环境中用“更深的模型和更深的因果的森林”的模型的模型来定义来进行一个比较的模型的实验性结果的模型, 。在对结果的模型的模型的实验性结果的模型的实验性结果的实验性结果的模型的模型的模型的模型的模型的模型的精度是取代了。