Benjamini-Hochberg假发现比例的中央限值定理 (A central limit theorem for the Benjamini-Hochberg false discovery proportion under a factor model)

from arxiv, Main changes in version 2: i) restated Corollary 1 in a way that is clearer and easier to use, ii) removed a regularity condition for our theorems (in particular we removed Condition 2 from version 1), and iii) we added a couple of remarks (namely, Remark 1 and 6 in version 2). Throughout the text we also fixed typos, improved clarity, and added a some additional commentary and references

The Benjamini-Hochberg (BH) procedure remains widely popular despite having limited theoretical guarantees in the commonly encountered scenario of correlated test statistics. Of particular concern is the possibility that the method could exhibit bursty behavior, meaning that it might typically yield no false discoveries while occasionally yielding both a large number of false discoveries and a false discovery proportion (FDP) that far exceeds its own well controlled mean. In this paper, we investigate which test statistic correlation structures lead to bursty behavior and which ones lead to well controlled FDPs. To this end, we develop a central limit theorem for the FDP in a multiple testing setup where the test statistic correlations can be either short-range or long-range as well as either weak or strong. The theorem and our simulations from a data-driven factor model suggest that the BH procedure exhibits severe burstiness when the test statistics have many strong, long-range correlations, but does not otherwise.

翻译：Benjani-Hochberg (BH) 程序尽管在通常遇到的相关测试统计数据的常见情景中,理论保障有限,但仍广为流行。特别令人关切的是,该方法有可能出现突发行为,这意味着它通常不会产生虚假发现,而偶尔产生大量虚假发现和远远超出自身控制范围的虚假发现比例(FDP ) 。在本文中,我们调查了哪些统计相关结构测试导致突发行为,哪些相关结构导致受到严格控制FDP。为此,我们在多个测试装置中为FDP开发了一个核心限值,其中测试统计数据的关联可以是短距离的,也可以是长距离的,也可以是弱的或强的。由数据驱动的因素模型得出的理论和我们的模拟表明,当测试统计数据有许多强大、长距离的关联时,BH程序显示出严重的爆发性,但情况并非如此。