局部学习的特征选择：Local-Learning-Based Feature Selection

2019 年 9 月 20 日 我爱读PAMI

本文说的是如何选择特征可以让一个数据点离自己人越近越好，同时还能远离敌人。问题是，如何知道谁是离自己最近的自己人，谁是离自己最近的敌人呢？在黑暗之中，特征选择照亮方寸的空间，让你能选择自己人远离敌人。作者是前老板，现在在UB做教授的Yijun Sun博士。前台的小妹总是叫他太阳(sun)博士。

Local-Learning-Based FeatureSelection for High-Dimensional Data Analysis

Yijun Sun ; Sinisa Todorovic ; Steve Goodison

IEEE Transactions on Pattern Analysis and MachineIntelligence

Year: 2010 | Volume: 32, Issue: 9 | Journal Article |Publisher: IEEE

Local-Learning-Based FeatureSelection for High-Dimensional Data Analysis

Yijun Sun ; Sinisa Todorovic ; Steve Goodison

IEEE Transactions on Pattern Analysis and MachineIntelligence

Year: 2010 | Volume: 32, Issue: 9 | Journal Article |Publisher: IEEE

Local-Learning-Based FeatureSelection for High-Dimensional Data Analysis

Yijun Sun ; Sinisa Todorovic ; Steve Goodison

IEEE Transactions on Pattern Analysis and MachineIntelligence

Year: 2010 | Volume: 32, Issue: 9 | Journal Article |Publisher: IEEE

This paper considers feature selection for data classification in the presence of a huge number of irrelevant features. We propose a new feature-selection algorithm that addresses several major issues with prior work, including problems with algorithm implementation, computational complexity, and solution accuracy. The key idea is to decompose an arbitrarily complex nonlinear problem into a set of locally linear ones through local learning, and then learn feature relevance globally within the large margin framework. The proposed algorithm is based on well-established machine learning and numerical analysis techniques, without making any assumptions about the underlying data distribution. It is capable of processing many thousands of features within minutes on a personal computer while maintaining a very high accuracy that is nearly insensitive to a growing number of irrelevant features. Theoretical analyses of the algorithm's sample complexity suggest that the algorithm has a logarithmical sample complexity with respect to the number of features. Experiments on 11 synthetic and real-world data sets demonstrate the viability of our formulation of the feature-selection problem for supervised learning and the effectiveness of our algorithm.

登录查看更多

相关内容

特征选择

关注 5933

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

【AAAI 2020】InteractE: 通过增加特征交互来改进基于卷积的知识图谱嵌入， InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions

专知会员服务

53+阅读 · 2020年6月7日

【KDD2020】AutoFIS: 因数分解模型中用于预测点击率的自动特征交互选择

专知会员服务

12+阅读 · 2020年5月27日

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

专知会员服务

26+阅读 · 2020年5月2日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日