使用基于图表的图表方法选择高维空间的特征 (Feature Selection in High-dimensional Space Using Graph-Based Methods)

High-dimensional feature selection is a central problem in a variety of application domains such as machine learning, image analysis, and genomics. In this paper, we propose graph-based tests as a useful basis for feature selection. We describe an algorithm for selecting informative features in high-dimensional data, where each observation comes from one of $K$ different distributions. Our algorithm can be applied in a completely nonparametric setup without any distributional assumptions on the data, and it aims at outputting those features in the data, that contribute the most to the overall distributional variation. At the heart of our method is the recursive application of distribution-free graph-based tests on subsets of the feature set, located at different depths of a hierarchical clustering tree constructed from the data. Our algorithm recovers all truly contributing features with high probability, while ensuring optimal control on false-discovery. Finally, we show the superior performance of our method over other existing ones through synthetic data, and also demonstrate the utility of the method on a real-life dataset from the domain of climate change.

翻译：高维特征选择是各种应用领域的中心问题,例如机器学习、图像分析和基因组学。在本文中,我们提出以图形为基础的测试作为选择特征的有用基础。我们描述了在高维数据中选择信息特征的算法,其中每个观测都来自美元的不同分布。我们的算法可以在完全非参数的设置中应用,而无需对数据作任何分配假设,其目的是在数据中输出那些最有助于总体分布变异的特征。我们方法的核心是对地物集的子集进行无分布式图形测试的循环应用,该子集位于从数据中构造的分层组群树的不同深度。我们的算法恢复了所有真正贡献的特征,非常有可能,同时确保对虚假发现进行最佳控制。最后,我们通过合成数据展示了我们的方法优于其他现有方法的性能,并展示了从气候变化领域对真实生活数据集的实用性。

相关内容

特征选择

关注 5935

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

2019年机器学习框架回顾

专知会员服务

36+阅读 · 2019年10月11日