Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of $p$-values under the alternative hypothesis simultaneously. We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.
翻译:大规模多重测试是高维统计推论的一个根本问题。 越来越常见的是,现有各种类型的辅助信息反映了假设之间的结构性关系,这种辅助信息日益普遍。 利用这种辅助信息可以增强统计力量。 为此,我们提议了一个基于两组混合模型的框架,对不同的假设具有不同的可能性,先验假设是无效的,先验假设是强加一种受形状限制的关系,辅助信息与先验可能性是无效的。 一种最佳拒绝规则的目的是在控制平均假发现率时,最大限度地增加真实的预期数字。我们以定购结构为重点,开发一个强有力的EM算法,以估计先前的无效概率和在替代假设下美元价值的分配。我们表明,拟议的方法比最先进的竞争者更有力量,同时控制假发现率,无论是经验还是理论上的。广泛的模拟显示了拟议方法的优势。从基因组联系研究中得出的数据集被用来说明新的方法。