实现无空无空的无控地物选择流程正常化 (Normalizing Flows for Knockoff-free Controlled Feature Selection)

The goal of controlled feature selection is to discover the features a response depends on while limiting the proportion of false discoveries to a predefined level. Recently, multiple methods have been proposed that use deep learning to generate knockoffs for controlled feature selection through the Model-X knockoff framework. We demonstrate, however, that these methods often fail to control the false discovery rate (FDR). There are two reasons for this shortcoming. First, these methods often learn inaccurate models of features. Second, the "swap" property, which is required for knockoffs to be valid, is often not well enforced. We propose a new procedure called FlowSelect that remedies both of these problems. To more accurately model the features, FlowSelect uses normalizing flows, the state-of-the-art method for density estimation. To circumvent the need to enforce the swap property, FlowSelect uses a novel MCMC-based procedure to directly compute p-values for each feature. Asymptotically, FlowSelect controls the FDR exactly. Empirically, FlowSelect controls the FDR well on both synthetic and semi-synthetic benchmarks, whereas competing knockoff-based approaches fail to do so. FlowSelect also demonstrates greater power on these benchmarks. Additionally, using data from a genome-wide association study of soybeans, FlowSelect correctly infers the genetic variants associated with specific soybean traits.

翻译：受控特性选择的目标是发现响应取决于的特征, 而同时将虚假发现的比例限制在预设水平上, 取决于响应的特性。最近, 提出了多种方法, 通过模型- X 的淘汰框架, 利用深层次学习产生对受控特性选择的淘汰。然而, 我们证明, 这些方法往往无法控制虚假发现率( FDR ) 。出现这一缺陷的原因有两个。首先, 这些方法往往会学习不准确的特征模型。其次, “ 抽取” 属性( 击出才能有效) 往往没有很好地执行。我们提议了一个新的程序, 叫做 FlowS 选择, 以补救这两个问题。为了更准确地模拟这些功能, FlookS 选择使用正常的流量, 即最先进的密度估计方法。为了避免执行互换属性, FlookS 选择使用新的基于 MC 程序直接计算每个特性的 p- 价值模型。简单来说, 滚动控制 FDR 精确地说,, 抽动Slect 控制FDW, 在合成和半合成和半合成同步基准上都控制FDRestal- slestalestalestalbislateal bestal bes 。

相关内容

特征选择

关注 5935

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日