Online Controlled Experiments (OCE) are the gold standard to measure impact and guide decisions for digital products and services. Despite many methodological advances in this area, the scarcity of public datasets and the lack of a systematic review and categorization hinder its development. We present the first survey and taxonomy for OCE datasets, which highlight the lack of a public dataset to support the design and running of experiments with adaptive stopping, an increasingly popular approach to enable quickly deploying improvements or rolling back degrading changes. We release the first such dataset, containing daily checkpoints of decision metrics from multiple, real experiments run on a global e-commerce platform. The dataset design is guided by a broader discussion on data requirements for common statistical tests used in digital experimentation. We demonstrate how to use the dataset in the adaptive stopping scenario using sequential and Bayesian hypothesis tests and learn the relevant parameters for each approach.
翻译:在线控制实验(OCE)是衡量影响和指导数字产品和服务决策的黄金标准。尽管在这一领域取得了许多方法上的进步,但公共数据集稀缺,缺乏系统审查和分类,阻碍了其发展。我们介绍了第一次OCE数据集调查和分类,其中强调缺乏公共数据集来支持适应性停止试验的设计和实施。适应性停止试验是一种日益流行的方法,可以迅速部署改进或反向有辱人格的变化。我们发布了第一个这类数据集,其中载有全球电子商务平台上运行的多重、真实实验决定指标的日检点。数据集的设计以关于数字实验中使用的共同统计测试的数据要求的更广泛讨论为指导。我们展示了如何使用适应性停止情景中的数据集,使用顺序和巴伊西亚假设测试,并学习每种方法的相关参数。