In many existing methods in multiple comparison, one starts with either Fisher's p-values or the local fdr scores. The former one, with a usual definition as the tail probability exceeding the observed test statistic under the null distribution, fails to use the information from the alternative hypothesis and the targeted region of signals could be completely wrong especially when the likelihood ratio function is not monotone. The local fdr based approaches, usually relying on the density functions, are optimal oracally. However, the targeted region of the signals of the data-driven version is problematic because of the slow convergence of the non-parametric density estimation especially on the boundaries. In this paper, we propose a new method: Cdf and Local fdr Assisted multiple Testing method (CLAT), which is optimal for cases when the p-values based method are not. Additionally, the data-driven version only relies on the estimation of the cumulative distribution function and converges to the oracle version quickly. Both simulations and real data analysis demonstrate the superior performance of the proposed method than the existing ones. Furthermore, the computation is instantaneous based on a novel algorithm and is scalable to the large data set.
翻译:在许多现有方法中,多则比较,先从Fisher的 p 值开始,或从本地的 fdr 分数开始。前一种方法通常的定义是尾概率超过在无效分布下观察到的测试统计,没有使用替代假设的信息,而信号的目标区域可能完全错误,特别是当概率比率函数不是单质时。通常依赖密度函数的基于本地 fdr 的方法是最佳或快速的。然而,数据驱动版本信号的目标区域有问题,因为非参数密度估计的趋同速度缓慢,特别是在边界上。在本文中,我们提出一种新的方法:Cdf 和地方 fdr 辅助多重测试方法(CLAT),这种方法在基于 p-value 的方法不采用的情况下是最佳的。此外,数据驱动版本仅依赖于累积分布函数的估计,并很快与星标版本相融合。模拟和真实数据分析都显示拟议方法的优于现有方法。此外,计算是根据新的算法进行瞬时,并且可以对大型数据集进行缩略。