Estimating treatment effects is one of the most challenging and important tasks of data analysts. Traditional statistical methods aim to estimate average treatment effects over a population. While being highly useful, such average treatment effects do not help to decide which individuals profit most by the treatment. This is where uplift modeling becomes important. Uplift models help to select the right individuals for treatment, to maximize the overall treatment effect (uplift). A challenging problem in uplift modeling is to evaluate the models. Previous literature suggests methods like the Qini curve and the transformed outcome mean squared error. However, these metrics suffer from variance: Their evaluations are strongly affected by random noise in the data, which makes these evaluations to a certain degree arbitrary. In this paper, we analyze the variance of the uplift evaluation metrics, on randomized controlled trial data, in a sound statistical manner. We propose certain outcome adjustment methods, for which we prove theoretically and empirically, that they reduce the variance of the uplift evaluation metrics. Our statistical analysis and the proposed outcome adjustment methods are a step towards a better evaluation practice in uplift modeling.
翻译:估计治疗效果是数据分析员最具有挑战性和最重要的任务之一。传统统计方法旨在估计对人口的平均治疗效果。这种平均治疗效果虽然非常有用,但无助于决定哪些人从治疗中获得最大利益。这就是提升模型的重要性所在。提升模型有助于选择正确的治疗对象,最大限度地扩大总体治疗效果(提升)。提升模型方面的一个具有挑战性的问题是评估模型。前文文献指出Qini曲线和转变后的结果平均正方差等方法。然而,这些指标存在差异:其评价受到数据随机噪音的严重影响,使这些评价具有一定程度的任意性。在本文件中,我们以良好的统计方式分析了根据随机控制的试验数据确定提高评价指标的差异。我们提出了一些成果调整方法,我们从理论上和实验上证明,这些方法减少了提高评价指标的差异。我们的统计分析和提议的结果调整方法是朝着提高模型的更好评价做法迈出了一步。