False discovery rate (FDR) controlling procedures provide important statistical guarantees for reproducibility in signal identification experiments with multiple hypotheses testing. In many recent applications, the same set of candidate features are studied in multiple independent experiments. For example, experiments repeated at different facilities and with different cohorts, and association studies with the same candidate features but different outcomes of interest. These studies provide us opportunities to identify signals by considering the experiments jointly. We study the question of how to provide reproducibility guarantees when we test composite null hypotheses on multiple features. Specifically, we test the unions of the null hypotheses from multiple experiments. We present a knockoff-based variable selection method to identify mutual signals from multiple independent experiments, with a finite sample size FDR control guarantee. We demonstrate the performance of this method with numerical studies and applications in analyzing crime data and TCGA data.
翻译:假发现率(FDR)控制程序为信号识别实验的复制提供了重要的统计保证。在最近的许多应用中,在多个独立实验中研究相同的候选特征。例如,在不同设施和不同组群重复的实验,以及具有相同候选特征但有不同关注结果的关联研究。这些研究为我们提供了通过共同考虑实验来识别信号的机会。我们研究了在测试多种特征的复合无主假设时如何提供重复保障的问题。具体地说,我们测试了多个实验的无效假设的结合。我们提出了一个基于天体的变量选择方法,以确定多个独立实验的相互信号,有一定样本的FDR控制保证。我们用数字研究和应用分析犯罪数据和TCGA数据来证明这种方法的绩效。