Recently, many causal estimators for Conditional Average Treatment Effect (CATE) and instrumental variable (IV) problems have been published and open sourced, allowing to estimate granular impact of both randomized treatments (such as A/B tests) and of user choices on the outcomes of interest. However, the practical application of such models has ben hampered by the lack of a valid way to score the performance of such models out of sample, in order to select the best one for a given application. We address that gap by proposing novel scoring approaches for both the CATE case and an important subset of instrumental variable problems, namely those where the instrumental variable is customer acces to a product feature, and the treatment is the customer's choice to use that feature. Being able to score model performance out of sample allows us to apply hyperparameter optimization methods to causal model selection and tuning. We implement that in an open source package that relies on DoWhy and EconML libraries for implementation of causal inference models (and also includes a Transformed Outcome model implementation), and on FLAML for hyperparameter optimization and for component models used in the causal models. We demonstrate on synthetic data that optimizing the proposed scores is a reliable method for choosing the model and its hyperparameter values, whose estimates are close to the true impact, in the randomized CATE and IV cases. Further, we provide examles of applying these methods to real customer data from Wise.
翻译:最近,公布了许多关于有条件平均治疗效果(CATE)和工具变量(IV)问题的因果估计器和工具变量(IV)问题,并开放源码,从而可以估计随机处理(如A/B测试)和用户对相关结果的选择的颗粒影响,然而,由于缺乏从抽样中从这些模型的性能中得出最佳分数的有效方法,这些模型的实际应用受到阻碍,无法从抽样中对这些模型的性能进行分数,从而无法为某项应用选择最佳的模型。我们通过为CATE案例和一系列重要的工具变量问题,即工具变量是产品特性的客户缩入,而治疗则是客户选择使用该特性的选项。能够从抽样中得分模型性能的模型,使我们能够运用超参数优化方法对因果模型的选择和调整。我们在一个依靠Dohory和ECONML图书馆实施因果关系推断模型的开放源包中(还包括一个变换结果模型的实施),以及FLAML用于超度优化模型和将要素模型用于选择其因果模型的任意模型,我们从选择其因果模型的精确度模型的精确度模型,我们展示了这些模型的合成数据。