Estimation of causal effects is the core objective of many scientific disciplines. However, it remains a challenging task, especially when the effects are estimated from observational data. Recently, several promising machine learning models have been proposed for causal effect estimation. The evaluation of these models has been based on the mean values of the error of the Average Treatment Effect (ATE) as well as of the Precision in Estimation of Heterogeneous Effect (PEHE). In this paper, we propose to complement the evaluation of causal inference models using concrete statistical evidence, including the performance profiles of Dolan and Mor{\'e}, as well as non-parametric and post-hoc statistical tests. The main motivation behind this approach is the elimination of the influence of a small number of instances or simulation on the benchmarking process, which in some cases dominate the results. We use the proposed evaluation methodology to compare several state-of-the-art causal effect estimation models.
翻译:估计因果关系是许多科学学科的核心目标,然而,这仍然是一项具有挑战性的任务,特别是在观测数据估计影响时。最近,提出了若干有希望的机器学习模型,以估计因果关系。对这些模型的评价依据的是平均治疗效果(ATE)错误的平均值和估计异质效应(PEHE)精确度的平均值。在本文件中,我们提议利用具体的统计证据,包括Dolan和Mor'e}的业绩简介,以及非参数性和热量后统计测试,来补充对因果关系模型的评价。这一方法的主要动机是消除少数实例或模拟对基准进程的影响,在某些情况下,这些影响占结果的主导。我们使用拟议的评价方法来比较一些最先进的因果关系估计模型。