Motivated by genome-wide association screening studies (GWAS), we study high-dimensional marginal screenings of categorical variables where test statistics have approximate chi-square distributions. We characterize four new phase transitions in high-dimensional chi-square models, and derive the signal sizes necessary and sufficient for statistical procedures to simultaneously control false discovery (in terms of family-wise error rate or false discovery rate) and missed detection (in terms of family-wise non-discovery rate or false non-discovery rate) in large dimensions. Remarkably, degrees of freedom in the chi-square distributions do not affect the boundaries in all four phase transitions. Several well-known procedures are shown to attain these boundaries. Two new phase transitions are also identified in the Gaussian location model under one-sided alternatives. We then elucidate on the nature of signal sizes in association tests by characterizing its relationship with marginal frequencies, odds ratio, and sample sizes in $2\times2$ contingency tables. This allows us to illustrate an interesting manifestation of the phase transition phenomena in genome-wide association studies (GWAS). We also show, perhaps surprisingly, that given total sample sizes, balanced designs in such association studies rarely deliver optimal power for detecting the effects of rare genetic variants.
翻译:在全基因组协会筛选研究(GWAS)的推动下,我们研究对绝对变量进行高维边际筛选,这些变量的测试统计数字大致分布在奇平方分布上。我们确定高维奇方形模型的四个新阶段的转变,并得出必要的信号大小和足够的统计程序,以便同时控制虚假发现(家庭错误率或假发现率)和大范围内的误发现(家庭认为的未发现率或虚假非发现率)和误发现(家庭认为的未发现率或误发现率),我们研究了大层面的隐性变量。显著的是,吉方形分布的自由程度并不影响所有四个阶段过渡阶段的边界。一些众所周知的程序已经显示可以达到这些界限。在高斯方位模型中也确定了两个新的阶段过渡,以片面替代替代模式同时控制错误发现(家庭错误率或假发现率)和误发现率(家庭认为的未发现率或误发现率)以及误发现(家庭认为未发现率或误发现率为2美元应急表)的情况。这使我们得以在基因组整体关联研究(GWASS)中展示阶段过渡现象的有趣表现了阶段转变现象的全阶段变化现象的特征,我们也难以为测测算的模型的模型的模型的模型的模型的模型的模型。