Modern application of A/B tests is challenging due to its large scale in various dimensions, which demands flexibility to deal with multiple testing sequentially. The state-of-the-art practice first reduces the observed data stream to always-valid p-values, and then chooses a cut-off as in conventional multiple testing schemes. Here we propose an alternative method called AMSET (adaptive multistage empirical Bayes test) by incorporating historical data in decision-making to achieve efficiency gains while retaining marginal false discovery rate (mFDR) control that is immune to peeking. We also show that a fully data-driven estimation in AMSET performs robustly to various simulation and real data settings at a large mobile app social network company.
翻译:A/B测试的现代应用具有挑战性,因为它在多个层面的规模很大,需要灵活处理连续多次测试。最先进的实践首先将观测到的数据流降低到始终有效的p值,然后选择常规多重测试计划中的截断点。这里我们建议了一种替代方法,即AMSET(适应性多阶段实验贝斯测试),将历史数据纳入决策,以提高效率,同时保留边际假发现率(mFDR)控制,避免偷看。我们还表明,AMSET中完全以数据为驱动的估算对大型移动应用程序社交网络公司的各种模拟和真实数据设置具有很强的影响。