Many experiments are concerned with the comparison of counts between treatment groups. Examples include the number of successful signups in conversion rate experiments, or the number of errors produced by software versions in canary experiments. Observations typically arrive in data streams and practitioners wish to continuously monitor their experiments, sequentially testing hypotheses while maintaining Type I error probabilities under optional stopping and continuation. These goals are frequently complicated in practice by non-stationary time dynamics. We provide practical solutions through sequential tests of multinomial hypotheses, hypotheses about many inhomogeneous Bernoulli processes and hypotheses about many time-inhomogeneous Poisson counting processes. For estimation, we further provide confidence sequences for multinomial probability vectors, all contrasts among probabilities of inhomogeneous Bernoulli processes and all contrasts among intensities of time-inhomogeneous Poisson counting processes. Together, these provide an "anytime-valid" inference framework for a wide variety of experiments dealing with count outcomes, which we illustrate with a number of industry applications.
翻译:许多实验都涉及对治疗组群的计算进行比较,例如转换率实验的成功征兆数目,或者由软件版本在金丝雀实验中产生的错误数目。观察通常出现在数据流中,从业者希望不断监测其实验,按顺序测试假设,同时在选择的停止和继续下保持类型I的误差概率。这些目标在实践中往往因非静止时间动态而变得复杂。我们通过连续测试多数值假说、许多不相容伯努利过程的假说和许多不相容波斯森计数过程的假说,提供了实际的解决办法。关于估计,我们进一步为多数值概率矢量的矢量提供了信任序列,所有不均匀的伯尔尼穆利过程的概率和不均匀时间-不均匀的Poisson计数过程之间的各种对比。我们用许多工业应用来说明,这些都为各种计数结果的实验提供了一个“随时有效的”推论框架。