A/B tests have been widely adopted across industries as the golden rule that guides decision making. However, the long-term true north metrics we ultimately want to drive through A/B test may take a long time to mature. In these situations, a surrogate metric which predicts the long-term metric is often used instead to conclude whether the treatment is effective. However, because the surrogate rarely predicts the true north perfectly, a regular A/B test based on surrogate metrics tends to have high false positive rate and the treatment variant deemed favorable from the test may not be the winning one. In this paper, we discuss how to adjust the A/B testing comparison to ensure experiment results are trustworthy. We also provide practical guidelines on the choice of good surrogate metrics. To provide a concrete example of how to leverage surrogate metrics for fast decision making, we present a case study on developing and evaluating the predicted confirmed hire surrogate metric in LinkedIn job marketplace.
翻译:A/B测试已被各行业广泛采用,作为指导决策的黄金规则。然而,我们最终希望通过A/B测试的长期真实的北方指标可能需要很长时间才能成熟。在这种情况下,通常使用预测长期指标的替代指标来断定治疗是否有效。然而,由于替代指标很少完美地预测真实的北方,基于代用指标的常规A/B测试往往具有很高的假正率,而被认为优于测试的治疗变量可能不是获胜的。在本文件中,我们讨论如何调整A/B测试的比较,以确保实验结果可信。我们还为选择良好的代用指标提供了实用指南。为如何利用代用指标快速决策提供具体实例,我们介绍了一项关于开发和评价LinkedIn工作市场中预计确认的雇用代用代用指标的案例研究。