We propose the Terminating-Knockoff (T-Knock) filter, a fast variable selection method for high-dimensional data. The T-Knock filter controls a user-defined target false discovery rate (FDR) while maximizing the number of selected variables. This is achieved by fusing the solutions of multiple early terminated random experiments. The experiments are conducted on a combination of the original predictors and multiple sets of randomly generated knockoff predictors. A finite sample proof based on martingale theory for the FDR control property is provided. Numerical simulations show that the FDR is controlled at the target level while allowing for a high power. We prove under mild conditions that the knockoffs can be sampled from any univariate probability distribution with existing finite expectation and variance. The computational complexity of the proposed method is derived and it is demonstrated via numerical simulations that the sequential computation time is multiple orders of magnitude lower than that of the strongest benchmark methods in sparse high-dimensional settings. The T-Knock filter outperforms state-of-the-art methods for FDR control on a simulated genome-wide association study (GWAS), while its computation time is more than two orders of magnitude lower than that of the strongest benchmark methods. An open source R package containing the implementation of the T-Knock filter is available at https://github.com/jasinmachkour/tknock.
翻译:我们提议了终止- Knockoff (T- Knock) 过滤器, 这是一种用于高维数据的快速变量选择方法。 T- Knock 过滤器控制了一个用户定义的目标错误发现率(FDR), 并同时将选定的变量数量最大化。 实现这个目标的方法是使用多个早期终止随机实验的解决方案。 实验是在原始预测器和多组随机生成的击落预测器的组合下进行的。 提供了基于FDR 控制属性的马丁格尔理论的有限样本证明。 数值模拟显示FDR控制在目标级别上,同时允许高功率。 我们证明, 在温和的条件下, 击出的目标目标目标目标虚假发现率(FDR) 虚假的假发现率发现率(FDR) 虚假的错误发现率分布可以与现有的有限期望和差异最大化。 拟议方法的计算复杂度是计算方法的,并且通过数字模拟显示, 连续计算时间时间比稀薄高的高维度环境中的FDR- K 系统/ 最强的基质控制方法要高得多。 在模拟的MDR- groom- groom- groom- bal- bal- bal 的测试中, 最短的系统/ broup- bal- broom- bism- bism- be be be be be be be be be be be mess lax is be be be be be be mess mess lax be be be be be lax lax lax lax lax lax lax lax lax lax laxis