Controlled feature selection aims to discover the features a response depends on while limiting the false discovery rate (FDR) to a predefined level. Recently, multiple deep-learning-based methods have been proposed to perform controlled feature selection through the Model-X knockoff framework. We demonstrate, however, that these methods often fail to control the FDR for two reasons. First, these methods often learn inaccurate models of features. Second, the "swap" property, which is required for knockoffs to be valid, is often not well enforced. We propose a new procedure called FlowSelect to perform controlled feature selection that does not suffer from either of these two problems. To more accurately model the features, FlowSelect uses normalizing flows, the state-of-the-art method for density estimation. Instead of enforcing the "swap" property, FlowSelect uses a novel MCMC-based procedure to calculate p-values for each feature directly. Asymptotically, FlowSelect computes valid p-values. Empirically, FlowSelect consistently controls the FDR on both synthetic and semi-synthetic benchmarks, whereas competing knockoff-based approaches do not. FlowSelect also demonstrates greater power on these benchmarks. Additionally, FlowSelect correctly infers the genetic variants associated with specific soybean traits from GWAS data.
翻译:受控特性选择旨在发现响应取决于的特征,同时将虚假发现率限制在预先定义的水平上,同时要发现响应取决于的特征。最近,提出了多项基于深学习的多种方法,以通过模型-X的淘汰框架进行受控特性选择。然而,我们证明,这些方法往往由于两个原因无法控制FDR。首先,这些方法往往学习不准确的特征模型。第二,“擦拭”属性(这是取舍有效所需的)往往没有得到很好执行。我们提议了一种名为 FlowSelect 的新程序,以进行不受这两个问题影响的受控特性选择。为了更准确地模拟这些特性,FlowSelect使用正常的流量,即密度估计的最先进的方法。除了执行“擦拭”属性外,FDRS选择还使用一种基于新式的基于 MC 程序直接计算每个特性的 p价值。 亚性、 流选计算有效 pvaluements。我们提议了一个名为 FDRD(FDR) 的功能选择,既不受这两个问题的影响,又不受这两个问题的任何影响。为了更精确的合成和半合成合成特征选择性特征选择性特征选择,将使用Slest- slest-relect slevew press relateal press press press press press press press the the the the the silent srelectalbilectalbilentalbildalbildalbildalbildalbildalgildaldaldalgildaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldsaldsaldaldaldaldaldaldaldaldaldalds praldalds praldaldaldaldaldaldaldaldaldaldalds praldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldaldalds 方法来控制, 方法, se