实现无空无空的无控地物选择流程正常化 (Normalizing Flows for Knockoff-free Controlled Feature Selection)

Controlled feature selection aims to discover the features a response depends on while limiting the false discovery rate (FDR) to a predefined level. Recently, multiple deep-learning-based methods have been proposed to perform controlled feature selection through the Model-X knockoff framework. We demonstrate, however, that these methods often fail to control the FDR for two reasons. First, these methods often learn inaccurate models of features. Second, the "swap" property, which is required for knockoffs to be valid, is often not well enforced. We propose a new procedure called FlowSelect that remedies both of these problems. To more accurately model the features, FlowSelect uses normalizing flows, the state-of-the-art method for density estimation. To circumvent the need to enforce the swap property, FlowSelect uses a novel MCMC-based procedure to calculate p-values for each feature directly. Asymptotically, FlowSelect computes valid p-values. Empirically, FlowSelect consistently controls the FDR on both synthetic and semi-synthetic benchmarks, whereas competing knockoff-based approaches do not. FlowSelect also demonstrates greater power on these benchmarks. Additionally, FlowSelect correctly infers the genetic variants associated with specific soybean traits from GWAS data.

翻译：受控特性选择旨在发现响应取决于的特征,同时将虚假发现率限制在预设水平,同时发现响应取决于的特征。最近,提出了多项基于深学习的多种方法,以通过模型-X的淘汰框架进行受控特性选择。然而,我们证明,这些方法往往由于两个原因无法控制FDR。首先,这些方法往往会学习不准确的特征模型。第二,“抽取”属性(这是出击有效所需的)往往没有很好地执行。我们提议了一个称为流程的新程序,即选择这些问题的补救方法。为了更准确地模拟这些特征,FlowSelect使用正常流、最先进的密度估计方法来进行受控特性选择。为避免执行互换属性的需要,FlowSelect使用基于新型的 MMC程序直接计算每个特性的p值。从本质上看,FlowSelect对有效的production-valy。我们建议,FDR将持续控制于合成和半合成基准,而根据相竞争的KFDR方法则使用正态的流动方法,而不是与GLEFS的基因变量相关的基准。

相关内容

特征选择

关注 5936

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

Nature论文: DeepMind用AI引导直觉解决数学猜想难题

专知会员服务

31+阅读 · 2021年12月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日