Estimating treatment effects is one of the most challenging and important tasks of data analysts. In many applications, like online marketing and personalized medicine, treatment needs to be allocated to the individuals where it yields a high positive treatment effect. Uplift models help select the right individuals for treatment and maximize the overall treatment effect (uplift). A major challenge in uplift modeling concerns model evaluation. Previous literature suggests methods like the Qini curve and the transformed outcome mean squared error. However, these metrics suffer from variance: their evaluations are strongly affected by random noise in the data, which renders their signals, to a certain degree, arbitrary. We theoretically analyze the variance of uplift evaluation metrics and derive possible methods of variance reduction, which are based on statistical adjustment of the outcome. We derive simple conditions under which the variance reduction methods improve the uplift evaluation metrics and empirically demonstrate their benefits on simulated and real-world data. Our paper provides strong evidence in favor of applying the suggested variance reduction procedures by default when evaluating uplift models on RCT data.
翻译:估计治疗效果是数据分析员最具有挑战性和最重要的任务之一。在许多应用中,如在线营销和个人化医学,需要将治疗分配给产生高积极治疗效果的个人。提升模型有助于选择合适的治疗对象,并最大限度地扩大总体治疗效果(提升)。提升模型是一个重大挑战。提升模型涉及模型评估。以前的文献建议了Qini曲线和转变后的结果平均平方差等方法。然而,这些指标存在差异:它们的评价受到数据随机噪音的强烈影响,这使得其信号在某种程度上具有任意性。我们理论上分析了提升评价指标的差异,并提出了可能的减低差异方法,这些方法以结果的统计调整为基础。我们提出了减少差异方法改进升级评价指标的简单条件,并用经验展示其在模拟和实际世界数据上的好处。我们的文件提供了有力的证据,支持在评价RCT数据提升模型时采用建议的差异减少程序。