False discovery rate (FDR) controlling procedures provide important statistical guarantees for the replicability in signal identification based on multiple hypotheses testing. In many fields of study, FDR controlling procedures are used in high-dimensional (HD) analyses to discover features that are truly associated with the outcome. In some recent applications, data on the same set of candidate features are independently collected in multiple different studies. For example, gene expression data are collected at different facilities and with different cohorts, to identify the genetic biomarkers of multiple types of cancers. These studies provide us opportunities to identify signals by considering information from different sources (with potential heterogeneity) jointly. This paper is about how to provide FDR control guarantees for the tests of union null hypotheses of conditional independence. We present a knockoff-based variable selection method (\textit{Simultaneous knockoffs}) to identify mutual signals from multiple independent data sets, providing exact FDR control guarantees under finite sample settings. This method can work with very general model settings and test statistics. We demonstrate the performance of this method with extensive numerical studies and two real data examples.
翻译:假发现率(FDR)控制程序为基于多种假设测试的信号识别可复制性提供了重要的统计保障。在许多研究领域,FDR控制程序用于高维(HD)分析,以发现与结果真正相关的特征。在最近的一些应用中,同一套候选特征的数据通过多种不同的研究独立收集。例如,在不同设施内收集基因表达数据,并与不同的组群收集这些数据,以查明多种类型癌症的遗传生物标志。这些研究为我们提供了机会,通过共同考虑不同来源(潜在异质)的信息来识别信号。本文涉及如何为测试有条件独立的无关联假设提供FDR控制保障。我们提出了一个基于敲击的变量选择方法(\ textit{Simultaney knoff}),以识别来自多个独立数据集的相互信号,在有限的抽样环境中提供准确的FDR控制保证。这种方法可以与非常普遍的模型设置和测试统计数据合作。我们用大量的数字研究和两个真实数据实例来展示这一方法的绩效。