大规模多重测试:虚假发现率控制和复合甲骨文基本限值 (Large-scale Multiple Testing: Fundamental Limits of False Discovery Rate Control and Compound Oracle)

The false discovery rate (FDR) and the false non-discovery rate (FNR), defined as the expected false discovery proportion (FDP) and the false non-discovery proportion (FNP), are the most popular benchmarks for multiple testing. Despite the theoretical and algorithmic advances in recent years, the optimal tradeoff between the FDR and the FNR has been largely unknown except for certain restricted class of decision rules, e.g., separable rules, or for other performance metrics, e.g., the marginal FDR and the marginal FNR (mFDR and mFNR). In this paper we determine the asymptotically optimal FDR-FNR tradeoff under the two-group random mixture model when the number of hypotheses tends to infinity. Distinct from the optimal mFDR-mFNR tradeoff, which is achieved by separable decision rules, the optimal FDR-FNR tradeoff requires compound rules and randomization even in the large-sample limit. A data-driven version of the oracle rule is proposed and shown to outperform existing methodologies on simulated data for models as simple as the normal mean model. Finally, to address the limitation of the FDR and FNR which only control the expectations but not the fluctuations of the FDP and FNP, we also determine the optimal tradeoff when the FDP and FNP are controlled with high probability and show it coincides with that of the mFDR and the mFNR.

翻译：虚假的发现率(FDR)和假的未发现率(FNR)被定义为预期的虚假发现比例(FDP)和假的未发现比例(FNP),是多项测试最受欢迎的基准。尽管近年来在理论和算法上取得了进步,但FDR和FNR之间的最佳权衡在很大程度上并不为人所知,但某些有限的决策规则类别除外,例如可分离的规则,或其他性能衡量标准,例如边缘的FDR和边缘的FNR(MFDR和MFNR)。在本文件中,我们确定在假冒数量往往不尽的时候,在两组随机混合模式下,FDR-FNR交易的过渡是非最佳的,与最佳的 mDR-MR交易率相比,最佳的FDR交易规则需要复合规则和随机化,即使在大宗减税限制中,也只是提出了由数据驱动的版本数据驱动法的数据版本,最后是模拟FDRDR和FDR的正常数据波动时,也显示FRDR的目前的方法,而我们只是模拟的正常的汇率和FDRFDR的控制模式。