Efron's two-group model is widely used in large scale multiple testing. This model assumes that test statistics are mutually independent, however in realistic settings they are typically dependent, and taking the dependence into account can boost power. The general two-group model takes the dependence between the test statistics into account. Optimal policies in the general two-group model require calculation, for each hypothesis, of the probability that it is a true null given all test statistics, denoted local false discovery rate (locFDR). Unfortunately, calculating locFDRs under realistic dependence structures can be computationally prohibitive. We propose calculating approximate locFDRs based on a properly defined N-neighborhood for each hypothesis. We prove that by thresholding the approximate locFDRs with a fixed threshold, the marginal false discovery rate is controlled for any dependence structure. Furthermore, we prove that this is the optimal procedure in a restricted class of decision rules, where decision for each hypothesis is only guided by its N-neighborhood. We show through extensive simulations that our proposed method achieves substantial power gains compared to alternative practical approaches, while maintaining conceptual simplicity and computational feasibility. We demonstrate the utility of our method on a genome wide association study of height.
翻译:Efron 的两组模式在大规模多重测试中广泛使用。 这个模式假定测试统计是相互独立的, 但是在现实的环境中它们通常依赖, 并且考虑到依赖性可以增强力量。 一般的两组模式将测试统计数据之间的依赖性考虑在内。 一般的两组模式的优化政策要求对每种假设都进行计算,即根据所有测试统计数据, 标明当地虚假发现率(locFDR), 是否真正无效。 不幸的是, 在现实依赖性结构下计算 LocFDR 可能无法进行计算。 我们提议根据适当定义的N邻里环境计算大约的LOCFDR。 我们证明,通过将大约的LOCFDR设定为固定阈值, 任何依赖性结构都要控制边际假发现率。 此外,我们证明,这是限制性决策规则中的最佳程序,其中对每种假设的决定只能以N- neiorbority为指导。 我们通过广泛的模拟来证明, 我们提议的方法与替代的实用方法相比,在保持概念的简单性和计算上的可行性。