A central obstacle in the objective assessment of treatment effect (TE) estimators in randomized control trials (RCTs) is the lack of ground truth (or validation set) to test their performance. In this paper, we propose a novel cross-validation-like methodology to address this challenge. The key insight of our procedure is that the noisy (but unbiased) difference-of-means estimate can be used as a ground truth "label" on a portion of the RCT, to test the performance of an estimator trained on the other portion. We combine this insight with an aggregation scheme, which borrows statistical strength across a large collection of RCTs, to present an end-to-end methodology for judging an estimator's ability to recover the underlying treatment effect as well as produce an optimal treatment "roll out" policy. We evaluate our methodology across 699 RCTs implemented in the Amazon supply chain. In this heavy-tailed setting, our methodology suggests that procedures that aggressively downweight or truncate large values, while introducing bias, lower the variance enough to ensure that the treatment effect is more accurately estimated.
翻译:在随机控制试验(RCTs)中,对治疗效果的客观评估估计值(TE)的一个中心障碍是缺乏实地真实性(或验证集)来测试其性能。在本文中,我们提出一种新的交叉验证方法来应对这一挑战。我们程序的关键洞察力是,在RCT的某一部分上,吵闹(但不带偏见)的差别估计值可以用作地面真实性“标签 ”, 以测试在另一部分上受过训练的测算员的性能。我们把这一洞察与一个集成计划结合起来,在大量RCT中借用了统计力量,提出一种最终到最终的方法,用以判断一个估计者恢复基本治疗效果的能力,并产生最佳的治疗“滚动”政策。我们评估亚马逊供应链中实施的699个RCTs的方法。在这种严重尾细化的设置中,我们的方法表明,在引入偏差的同时,会降低差异,以确保更准确地估计治疗效果。