The problems of large-scale multiple testing are often encountered in modern scientific researches. Conventional multiple testing procedures usually suffer considerable loss of testing efficiency due to the lack of consideration of correlations among tests. In fact, the appropriate use of correlation information not only enhances the efficacy of multiple testing but also improves the interpretability of the results. Since the disease- or trait-related single nucleotide polymorphisms (SNPs) often tend to be clustered and exhibit serial correlations, the hidden Markov model (HMM) based multiple testing procedure has been successfully applied in genome-wide association studies (GWAS). It is important to note that modeling the entire chromosome using one HMM is somewhat rough. To overcome this issue, this paper employs the hierarchical hidden Markov model (HHMM) to describe local correlations among tests and develops a multiple testing procedure that can not only automatically divide different class of chromosome regions, but also takes into account local correlations among tests. Theoretically, it is shown that the proposed multiple testing procedure is valid and optimal in some sense. Then a data-driven procedure is developed to mimic the oracle version. Extensive simulations and the real data analysis show that the novel multiple testing procedure outperforms its competitors.
翻译:在现代科学研究中,经常遇到大规模多重测试的问题。常规多重测试程序通常会因不考虑各种测试之间的相互关系而大量丧失测试效率。事实上,适当使用相关信息不仅提高了多重测试的效力,而且提高了结果的可解释性。由于与疾病或特征有关的单一核糖化多形态(SNPs)往往会发生集群并显示一系列关联,基于隐蔽的Markov模型(HMM)的多重测试程序已经成功地应用于整个基因组的联系研究(GWAS ) 。必须指出,使用一个HMMM(HWAS)来模拟整个染色体的模型有些粗略。为了克服这一问题,本文采用了等级隐藏的Markov模型(HMMMM)来描述各种测试之间的当地关联,并开发一种不仅能够自动区分不同等级的染色体区域,而且能够考虑到各种测试之间的本地关联的多重测试程序。理论上,已经表明拟议的多重测试程序在某种意义上是有效的和最佳的。然后开发一种数据驱动程序,以模拟或模版的数据模拟程序来显示真实的测试。