局部学习的特征选择:Local-Learning-Based Feature Selection

2019 年 9 月 20 日 我爱读PAMI
局部学习的特征选择:Local-Learning-Based Feature Selection

本文说的是如何选择特征可以让一个数据点离自己人越近越好,同时还能远离敌人。问题是,如何知道谁是离自己最近的自己人,谁是离自己最近的敌人呢?在黑暗之中,特征选择照亮方寸的空间,让你能选择自己人远离敌人。作者是前老板,现在在UB做教授的Yijun Sun博士。前台的小妹总是叫他太阳(sun)博士。







Local-Learning-Based FeatureSelection for High-Dimensional Data Analysis

Yijun Sun ; Sinisa Todorovic ; Steve Goodison

IEEE Transactions on Pattern Analysis and MachineIntelligence

Year: 2010 | Volume: 32, Issue: 9 | Journal Article |Publisher: IEEE


Local-Learning-Based FeatureSelection for High-Dimensional Data Analysis

Yijun Sun ; Sinisa Todorovic ; Steve Goodison

IEEE Transactions on Pattern Analysis and MachineIntelligence

Year: 2010 | Volume: 32, Issue: 9 | Journal Article |Publisher: IEEE


Local-Learning-Based FeatureSelection for High-Dimensional Data Analysis

Yijun Sun ; Sinisa Todorovic ; Steve Goodison

IEEE Transactions on Pattern Analysis and MachineIntelligence

Year: 2010 | Volume: 32, Issue: 9 | Journal Article |Publisher: IEEE


This paper considers feature selection for data classification in the presence of a huge number of irrelevant features. We propose a new feature-selection algorithm that addresses several major issues with prior work, including problems with algorithm implementation, computational complexity, and solution accuracy. The key idea is to decompose an arbitrarily complex nonlinear problem into a set of locally linear ones through local learning, and then learn feature relevance globally within the large margin framework. The proposed algorithm is based on well-established machine learning and numerical analysis techniques, without making any assumptions about the underlying data distribution. It is capable of processing many thousands of features within minutes on a personal computer while maintaining a very high accuracy that is nearly insensitive to a growing number of irrelevant features. Theoretical analyses of the algorithm's sample complexity suggest that the algorithm has a logarithmical sample complexity with respect to the number of features. Experiments on 11 synthetic and real-world data sets demonstrate the viability of our formulation of the feature-selection problem for supervised learning and the effectiveness of our algorithm.


登录查看更多
10

相关内容

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS ),或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化,是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

The quest of `can machines think' and `can machines do what human do' are quests that drive the development of artificial intelligence. Although recent artificial intelligence succeeds in many data intensive applications, it still lacks the ability of learning from limited exemplars and fast generalizing to new tasks. To tackle this problem, one has to turn to machine learning, which supports the scientific study of artificial intelligence. Particularly, a machine learning problem called Few-Shot Learning (FSL) targets at this case. It can rapidly generalize to new tasks of limited supervised experience by turning to prior knowledge, which mimics human's ability to acquire knowledge from few examples through generalization and analogy. It has been seen as a test-bed for real artificial intelligence, a way to reduce laborious data gathering and computationally costly training, and antidote for rare cases learning. With extensive works on FSL emerging, we give a comprehensive survey for it. We first give the formal definition for FSL. Then we point out the core issues of FSL, which turns the problem from "how to solve FSL" to "how to deal with the core issues". Accordingly, existing works from the birth of FSL to the most recent published ones are categorized in a unified taxonomy, with thorough discussion of the pros and cons for different categories. Finally, we envision possible future directions for FSL in terms of problem setup, techniques, applications and theory, hoping to provide insights to both beginners and experienced researchers.

0
331
下载
预览

Because of continuous advances in mathematical programing, Mix Integer Optimization has become a competitive vis-a-vis popular regularization method for selecting features in regression problems. The approach exhibits unquestionable foundational appeal and versatility, but also poses important challenges. We tackle these challenges, reducing computational burden when tuning the sparsity bound (a parameter which is critical for effectiveness) and improving performance in the presence of feature collinearity and of signals that vary in nature and strength. Importantly, we render the approach efficient and effective in applications of realistic size and complexity - without resorting to relaxations or heuristics in the optimization, or abandoning rigorous cross-validation tuning. Computational viability and improved performance in subtler scenarios is achieved with a multi-pronged blueprint, leveraging characteristics of the Mixed Integer Programming framework and by means of whitening, a data pre-processing step.

0
5
下载
预览

Feature Selection Library (FSLib) is a widely applicable MATLAB library for Feature Selection (FS). FS is an essential component of machine learning and data mining which has been studied for many years under many different conditions and in diverse scenarios. These algorithms aim at ranking and selecting a subset of relevant features according to their degrees of relevance, preference, or importance as defined in a specific application. Because feature selection can reduce the amount of features used for training classification models, it alleviates the effect of the curse of dimensionality, speeds up the learning process, improves model's performance, and enhances data understanding. This short report provides an overview of the feature selection algorithms included in the FSLib MATLAB toolbox among filter, embedded, and wrappers methods.

0
3
下载
预览

Accurately classifying malignancy of lesions detected in a screening scan plays a critical role in reducing false positives. Through extracting and analyzing a large numbers of quantitative image features, radiomics holds great potential to differentiate the malignant tumors from benign ones. Since not all radiomic features contribute to an effective classifying model, selecting an optimal feature subset is critical. This work proposes a new multi-objective based feature selection (MO-FS) algorithm that considers both sensitivity and specificity simultaneously as the objective functions during the feature selection. In MO-FS, we developed a modified entropy based termination criterion (METC) to stop the algorithm automatically rather than relying on a preset number of generations. We also designed a solution selection methodology for multi-objective learning using the evidential reasoning approach (SMOLER) to automatically select the optimal solution from the Pareto-optimal set. Furthermore, an adaptive mutation operation was developed to generate the mutation probability in MO-FS automatically. The MO-FS was evaluated for classifying lung nodule malignancy in low-dose CT and breast lesion malignancy in digital breast tomosynthesis. Compared with other commonly used feature selection methods, the experimental results for both lung nodule and breast lesion malignancy classification demonstrated that the feature set by selected MO-FS achieved better classification performance.

0
5
下载
预览

We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning. It uses self-attention to iteratively reason about the relations between entities in a scene and to guide a model-free policy. Our results show that in a novel navigation and planning task called Box-World, our agent finds interpretable solutions that improve upon baselines in terms of sample complexity, ability to generalize to more complex scenes than experienced during training, and overall performance. In the StarCraft II Learning Environment, our agent achieves state-of-the-art performance on six mini-games -- surpassing human grandmaster performance on four. By considering architectural inductive biases, our work opens new directions for overcoming important, but stubborn, challenges in deep RL.

0
6
下载
预览

Meta-learning is a powerful tool that builds on multi-task learning to learn how to quickly adapt a model to new tasks. In the context of reinforcement learning, meta-learning algorithms can acquire reinforcement learning procedures to solve new problems more efficiently by meta-learning prior tasks. The performance of meta-learning algorithms critically depends on the tasks available for meta-training: in the same way that supervised learning algorithms generalize best to test points drawn from the same distribution as the training points, meta-learning methods generalize best to tasks from the same distribution as the meta-training tasks. In effect, meta-reinforcement learning offloads the design burden from algorithm design to task design. If we can automate the process of task design as well, we can devise a meta-learning algorithm that is truly automated. In this work, we take a step in this direction, proposing a family of unsupervised meta-learning algorithms for reinforcement learning. We describe a general recipe for unsupervised meta-reinforcement learning, and describe an effective instantiation of this approach based on a recently proposed unsupervised exploration technique and model-agnostic meta-learning. We also discuss practical and conceptual considerations for developing unsupervised meta-learning methods. Our experimental results demonstrate that unsupervised meta-reinforcement learning effectively acquires accelerated reinforcement learning procedures without the need for manual task design, significantly exceeds the performance of learning from scratch, and even matches performance of meta-learning methods that use hand-specified task distributions.

0
6
下载
预览

This research mainly emphasizes on traffic detection thus essentially involving object detection and classification. The particular work discussed here is motivated from unsatisfactory attempts of re-using well known pre-trained object detection networks for domain specific data. In this course, some trivial issues leading to prominent performance drop are identified and ways to resolve them are discussed. For example, some simple yet relevant tricks regarding data collection and sampling prove to be very beneficial. Also, introducing a blur net to deal with blurred real time data is another important factor promoting performance elevation. We further study the neural network design issues for beneficial object classification and involve shared, region-independent convolutional features. Adaptive learning rates to deal with saddle points are also investigated and an average covariance matrix based pre-conditioned approach is proposed. We also introduce the use of optical flow features to accommodate orientation information. Experimental results demonstrate that this results in a steady rise in the performance rate.

0
3
下载
预览

In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. The optimal cost function of the aggregate problem, a nonlinear function of the features, serves as an architecture for approximation in value space of the optimal cost function or the cost functions of policies of the original problem. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with reinforcement learning based on deep neural networks, which is used to obtain the needed features. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by deep reinforcement learning, thereby potentially leading to more effective policy improvement.

0
8
下载
预览

In recent years, person re-identification (re-id) catches great attention in both computer vision community and industry. In this paper, we propose a new framework for person re-identification with a triplet-based deep similarity learning using convolutional neural networks (CNNs). The network is trained with triplet input: two of them have the same class labels and the other one is different. It aims to learn the deep feature representation, with which the distance within the same class is decreased, while the distance between the different classes is increased as much as possible. Moreover, we trained the model jointly on six different datasets, which differs from common practice - one model is just trained on one dataset and tested also on the same one. However, the enormous number of possible triplet data among the large number of training samples makes the training impossible. To address this challenge, a double-sampling scheme is proposed to generate triplets of images as effective as possible. The proposed framework is evaluated on several benchmark datasets. The experimental results show that, our method is effective for the task of person re-identification and it is comparable or even outperforms the state-of-the-art methods.

0
3
下载
预览

When humans perform inductive learning, they often enhance the process with background knowledge. With the increasing availability of well-formed collaborative knowledge bases, the performance of learning algorithms could be significantly enhanced if a way were found to exploit these knowledge bases. In this work, we present a novel algorithm for injecting external knowledge into induction algorithms using feature generation. Given a feature, the algorithm defines a new learning task over its set of values, and uses the knowledge base to solve the constructed learning task. The resulting classifier is then used as a new feature for the original problem. We have applied our algorithm to the domain of text classification using large semantic knowledge bases. We have shown that the generated features significantly improve the performance of existing learning algorithms.

0
4
下载
预览
小贴士
相关资讯
Hierarchically Structured Meta-learning
CreateAMind
12+阅读 · 2019年5月22日
深度自进化聚类:Deep Self-Evolution Clustering
我爱读PAMI
11+阅读 · 2019年4月13日
Unsupervised Learning via Meta-Learning
CreateAMind
32+阅读 · 2019年1月3日
meta learning 17年:MAML SNAIL
CreateAMind
9+阅读 · 2019年1月2日
Reinforcement Learning: An Introduction 2018第二版 500页
CreateAMind
9+阅读 · 2018年4月27日
随波逐流:Similarity-Adaptive and Discrete Optimization
我爱读PAMI
4+阅读 · 2018年2月6日
【学习】(Python)SVM数据分类
机器学习研究会
5+阅读 · 2017年10月15日
【推荐】SVM实例教程
机器学习研究会
16+阅读 · 2017年8月26日
相关VIP内容
《动手学深度学习》(Dive into Deep Learning)PyTorch实现
专知会员服务
78+阅读 · 2019年12月31日
Stabilizing Transformers for Reinforcement Learning
专知会员服务
28+阅读 · 2019年10月17日
相关论文
Few-shot Learning: A Survey
Yaqing Wang,Quanming Yao
331+阅读 · 2019年4月10日
Efficient and Effective $L_0$ Feature Selection
Ana Kenney,Francesca Chiaromonte,Giovanni Felici
5+阅读 · 2018年8月7日
Feature Selection Library (MATLAB Toolbox)
Giorgio Roffo
3+阅读 · 2018年8月6日
Automatic multi-objective based feature selection for classification
Zhiguo Zhou,Shulong Li,Genggeng Qin,Michael Folkert,Steve Jiang,Jing Wang
5+阅读 · 2018年7月9日
Relational Deep Reinforcement Learning
Vinicius Zambaldi,David Raposo,Adam Santoro,Victor Bapst,Yujia Li,Igor Babuschkin,Karl Tuyls,David Reichert,Timothy Lillicrap,Edward Lockhart,Murray Shanahan,Victoria Langston,Razvan Pascanu,Matthew Botvinick,Oriol Vinyals,Peter Battaglia
6+阅读 · 2018年6月28日
Abhishek Gupta,Benjamin Eysenbach,Chelsea Finn,Sergey Levine
6+阅读 · 2018年6月12日
Wentong Liao,Michael Ying Yang,Ni Zhan,Bodo Rosenhahn
3+阅读 · 2018年2月9日
Lior Friedman,Shaul Markovitch
4+阅读 · 2018年1月31日
Top