Motivated by the inquiries of weak signals in underpowered genome-wide association studies (GWASs), we consider the problem of retaining true signals that are not strong enough to be individually separable from a large amount of noise. We address the challenge from the perspective of false negative control and present false negative control (FNC) screening, a data-driven method to efficiently regulate false negative proportion at a user-specified level. FNC screening is developed in a realistic setting with arbitrary covariance dependence between variables. We calibrate the overall dependence through a parameter whose scale is compatible with the existing phase diagram in high-dimensional sparse inference. Utilizing the new calibration, we asymptotically explicate the joint effect of covariance dependence, signal sparsity, and signal intensity on the proposed method. We interpret the results using a new phase diagram, which shows that FNC screening can efficiently select a set of candidate variables to retain a high proportion of signals even when the signals are not individually separable from noise. Finite sample performance of FNC screening is compared to those of several existing methods in simulation studies. The proposed method outperforms the others in adapting to a user-specified false negative control level. We implement FNC screening to empower a two-stage GWAS procedure, which demonstrates substantial power gain when working with limited sample sizes in real applications.
翻译:在对权力不足的全基因组协会研究(GWAS)中薄弱信号进行调查的推动下,我们考虑了保留真实信号的问题,这些信号不够强大,无法单独从大量噪音中分离出来;我们从错误的消极控制和提供虚假的消极控制(FNC)筛选这一角度来应对挑战,这是一种数据驱动方法,目的是在用户指定的层次上有效调节错误的负比例;FNC筛选是在一个现实的环境中进行的,各变量之间有任意的共变依赖性;我们通过一个参数对总体依赖性进行校准,该参数的尺度与现有高维度稀释的阶段图表相容;利用新的校准,我们从轻而易举地探讨了共变异依赖性、信号宽度和对拟议方法的信号强度的共同影响;我们用一个新的阶段图表来解释结果,表明FNCS筛选能够有效地选择一套候选变量,以保持高比例的信号,即使信号不能单独从噪音中解析;FNC筛选的精度样品性性表现与模拟研究中的若干现有方法相比;利用新的校准方法,我们以错误的方法在实际的进度上使GS样本程序能够进行真正的筛选。