A fundamental task in the analysis of datasets with many variables is screening for associations. This can be cast as a multiple testing task, where the objective is achieving high detection power while controlling type I error. We consider $m$ hypothesis tests represented by pairs $((P_i, X_i))_{1\leq i \leq m}$ of p-values $P_i$ and covariates $X_i$, such that $P_i \perp X_i$ if $H_i$ is null. Here, we show how to use information potentially available in the covariates about heterogeneities among hypotheses to increase power compared to conventional procedures that only use the $P_i$. To this end, we upgrade existing weighted multiple testing procedures through the Independent Hypothesis Weighting (IHW) framework to use data-driven weights that are calculated as a function of the covariates. Finite sample guarantees, e.g., false discovery rate (FDR) control, are derived from cross-weighting, a data-splitting approach that enables learning the weight-covariate function without overfitting as long as the hypotheses can be partitioned into independent folds, with arbitrary within-fold dependence. IHW has increased power compared to methods that do not use covariate information. A key implication of IHW is that hypothesis rejection in common multiple testing setups should not proceed according to the ranking of the p-values, but by an alternative ranking implied by the covariate-weighted p-values.
翻译:分析包含许多变量的数据集的基本任务就是筛选关联。 这可以作为一个多重测试任务, 目标是在控制I型错误的同时实现高检测功率。 我们考虑一对( (P_i, X_i)) $1\\leq i\leq m} 美元 p- 价值的假设测试 $P_ i$ 美元, 并使用x_ i 美元, 例如如果 $H_ 美元为空, 则以美元计算 $_ i\ perp X_ i$ 。 这里, 我们展示了如何使用关于假设值之间异差值的变量中可能可获得的信息, 来增加能量, 而不是使用仅使用 $P_ i 美元 的常规程序。 为此, 我们通过独立 Hypothesisighting (IHW) 框架更新了现有的加权多重测试程序, 以计算为变量函数的函数。 Finite 样本保证, 例如, 错误的发现率(FDR) 控制来自交叉加权, 一个数据分割法方法,, 将数据分割法方法可以将数据转换为自动缩缩缩缩缩缩缩缩缩缩缩 。