Online A/B testing plays a critical role in the high-tech industry to guide product development and accelerate innovation. It performs a null hypothesis statistical test to determine which variant is better. However, a typical A/B test presents two problems: (i) a fixed-horizon framework inflates the false-positive errors under continuous monitoring; (ii) the homogeneous effects assumption fails to identify a subgroup with a beneficial treatment effect. In this paper, we propose a sequential test for subgroup treatment effects based on value difference, named SUBTLE, to address these two problems simultaneously. The SUBTLE allows the experimenters to "peek" at the results during the experiment without harming the statistical guarantees. It assumes heterogeneous treatment effects and aims to test if some subgroup of the population will benefit from the investigative treatment. If the testing result indicates the existence of such a subgroup, a subgroup will be identified using a readily available estimated optimal treatment rule. We examine the empirical performance of our proposed test on both simulations and a real dataset. The results show that the SUBTLE has high detection power with controlled type I error at any time, is more robust to noise covariates, and can achieve early stopping compared with the corresponding fixed-horizon test.
翻译:在线A/B测试在高科技产业中发挥着关键作用,以指导产品开发并加速创新。它进行一个无效假设的统计测试,以确定哪个变量更好。然而,典型的A/B测试提出两个问题:(一) 固定正方圆框架,在持续监测下将错误阳性错误膨胀;(二) 同一效应假设未能确定具有有益治疗效果的分组。在本文件中,我们提议根据价值差异对分组治疗效果进行顺序测试,同时解决这两个问题。SubTLE允许实验者在试验期间对试验结果进行“比对”,而不会损害统计保证。它假定了不同的治疗效果,目的是测试某些人口分组是否受益于调查治疗。如果测试结果显示存在这样一个分组,将使用现成的估计最佳治疗规则确定一个分组。我们在模拟和真实数据集中拟议测试的经验性表现。结果显示,SubTLE在任何时间里都具有高检测力且受控型I错误的检测能力,对于噪音共变式测试来说都更可靠,能够及早停止。