We develop a new class of distribution--free multiple testing rules for false discovery rate (FDR) control under general dependence. A key element in our proposal is a symmetrized data aggregation (SDA) approach to incorporating the dependence structure via sample splitting, data screening and information pooling. The proposed SDA filter first constructs a sequence of ranking statistics that fulfill global symmetry properties, and then chooses a data--driven threshold along the ranking to control the FDR. The SDA filter substantially outperforms the knockoff method in power under moderate to strong dependence, and is more robust than existing methods based on asymptotic $p$-values. We first develop finite--sample theory to provide an upper bound for the actual FDR under general dependence, and then establish the asymptotic validity of SDA for both the FDR and false discovery proportion (FDP) control under mild regularity conditions. The procedure is implemented in the R package \texttt{SDA}. Numerical results confirm the effectiveness and robustness of SDA in FDR control and show that it achieves substantial power gain over existing methods in many settings.
翻译:我们为一般依赖性情况下的虚假发现率(FDR)控制制定了一种新的无分配性多重测试规则。我们提案中的一个关键要素是采用对称数据汇总(SDA)方法,通过样本分割、数据筛选和信息集合将依赖性结构纳入其中。提议的SDA过滤器首先构建符合全球对称特性的排名统计序列,然后在控制FDR的排名中选择一个数据驱动阈值。SDA过滤器大大优于中度至强度依赖性下的权力关闭方法,并且比现有方法更加健全。我们首先开发了有限抽样理论,为一般依赖性下的实际FDR提供上限,然后在轻微的常规条件下为FDR和假发现比例控制确定自定义值。该程序在R套件\ textt{SDA}中实施。数量结果证实SDA在FDR控制中的有效性和稳健健。我们首先开发公司在很多情况下获得巨大权力。