Modern data analysis frequently involves large-scale hypothesis testing, which naturally gives rise to the problem of maintaining control of a suitable type I error rate, such as the false discovery rate (FDR). In many biomedical and technological applications, an additional complexity is that hypotheses are tested in an online manner, one-by-one over time. However, traditional procedures that control the FDR, such as the Benjamini-Hochberg procedure, assume that all p-values are available to be tested at a single time point. To address these challenges, a new field of methodology has developed over the past 15 years showing how to control error rates for online multiple hypothesis testing. In this framework, hypotheses arrive in a stream, and at each time point the analyst decides whether to reject the current hypothesis based both on the evidence against it, and on the previous rejection decisions. In this paper, we present a comprehensive exposition of the literature on online error rate control, with a review of key theory as well as a focus on applied examples. We also provide simulation results comparing different online testing algorithms and an up-to-date overview of the many methodological extensions that have been proposed.
翻译:现代数据分析经常涉及大规模假设测试,这自然会产生维持对适当类型I错误率的控制的问题,如虚假发现率。在许多生物医学和技术应用中,另一个复杂之处是假设是用在线方式,逐年测试,然而,控制FDR的传统程序,如Benjami-Hochberg程序,假定所有p价值都可在单一时间点进行测试。为了应对这些挑战,过去15年来开发了一个新的方法领域,表明如何控制在线多重假设测试的错误率。在这个框架中,假设会到达一个流体,每次分析师都根据对它提出的证据和先前的拒绝决定决定是否拒绝目前的假设。在本文件中,我们全面介绍了在线错误率控制文献,审查了关键理论,并着重应用实例。我们还提供了模拟结果,比较了不同的在线测试算法,并对提议的许多方法扩展进行了最新概览。