多级分类任务极端特征选择</s> (Graph-based Extreme Feature Selection for Multi-class Classification Tasks)

When processing high-dimensional datasets, a common pre-processing step is feature selection. Filter-based feature selection algorithms are not tailored to a specific classification method, but rather rank the relevance of each feature with respect to the target and the task. This work focuses on a graph-based, filter feature selection method that is suited for multi-class classifications tasks. We aim to drastically reduce the number of selected features, in order to create a sketch of the original data that codes valuable information for the classification task. The proposed graph-based algorithm is constructed by combing the Jeffries-Matusita distance with a non-linear dimension reduction method, diffusion maps. Feature elimination is performed based on the distribution of the features in the low-dimensional space. Then, a very small number of feature that have complementary separation strengths, are selected. Moreover, the low-dimensional embedding allows to visualize the feature space. Experimental results are provided for public datasets and compared with known filter-based feature selection techniques.

翻译：在处理高维数据集时,一个共同的预处理步骤是特性选择。基于过滤的特性选择算法不是根据特定的分类方法定制的,而是根据目标和任务排列每个特性的适切性。这项工作侧重于一个适合多级分类任务的基于图形的筛选特性选择方法。我们的目标是大量减少选定特性的数量,以便绘制原始数据的草图,为分类任务编码有价值的信息。拟议的基于图形的算法是用非线性尺寸减少法(扩散图)对杰弗里-马塔西塔距离进行梳理的。根据低维空间特征的分布进行特性清除。然后,选择了非常少量具有互补分离优势的特性。此外,低维嵌入使特性空间能够直观化。为公共数据集提供实验结果,并与已知的基于过滤的特性选择技术进行比较。</s>

相关内容

特征选择

关注 5935

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日