When interpreting A/B tests, we typically focus only on the statistically significant results and take them by face value. This practice, termed post-selection inference in the statistical literature, may negatively affect both point estimation and uncertainty quantification, and therefore hinder trustworthy decision making in A/B testing. To address this issue, in this paper we explore two seemingly unrelated paths, one based on supervised machine learning and the other on empirical Bayes, and propose post-selection inferential approaches that combine the strengths of both. Through large-scale simulated and empirical examples, we demonstrate that our proposed methodologies stand out among other existing ones in both reducing post-selection biases and improving confidence interval coverage rates, and discuss how they can be conveniently adjusted to real-life scenarios.
翻译:在解释A/B测试时,我们通常只关注具有统计意义的结果,并以面值取而代之。这种做法在统计文献中称为选后推论,可能会对点估计和不确定性量化产生消极影响,从而妨碍A/B测试中的可信决策。为了解决这一问题,我们在本文件中探索了两种似乎不相干的道路,一种是以监督的机器学习为基础,另一种是以经验性贝耶斯为基础,并提出了将两者的优势结合起来的选后推论方法。通过大规模模拟和实证实例,我们表明,在减少选后偏差和改善信任间隔率方面,我们提出的方法与其他现有方法相比是突出的,我们讨论了如何方便地调整这些方法以适应现实生活情景。