AB testing evaluates the difference between a control and a treatment in a statistically rigorous manner. Continuous monitoring allows statistical evaluation of an AB test as it proceeds. One goal of continuous monitoring is early stopping -- confirming a statistically significant difference between control and treatment as soon as possible. Another goal is to maintain some statistical capability to discover significant differences later in the test if they cannot be confirmed earlier. These goals are in conflict -- looser requirements for early stopping leave us with more stringent ones for later. This paper shows that it is impossible to maintain a constant requirement for significance for tests that have no a priori stopping time, but we can come arbitrarily close to that goal by using tests that require repeated significant results to con rm statistically significant differences between treatment and control.
翻译:暂无翻译