压缩评分:用于选择不受监督特性的快速过滤法 (Compactness Score: A Fast Filter Method for Unsupervised Feature Selection)

For feature engineering, feature selection seems to be an important research content in which is anticipated to select "excellent" features from candidate ones. Different functions can be realized through feature selection, such as dimensionality reduction, model effect improvement, and model performance improvement. Along with the flourish of the information age, huge amounts of high-dimensional data are generated day by day, while we need to spare great efforts and time to label such data. Therefore, various algorithms are proposed to address such data, among which unsupervised feature selection has attracted tremendous interests. In many classification tasks, researchers found that data seem to be usually close to each other if they are from the same class; thus, local compactness is of great importance for the evaluation of a feature. In this manuscript, we propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS), to select desired features. To demonstrate the efficiency and accuracy, several data sets are chosen with intensive experiments being performed. Later, the effectiveness and superiority of our method are revealed through addressing clustering tasks. Here, the performance is indicated by several well-known evaluation metrics, while the efficiency is reflected by the corresponding running time. As revealed by the simulation results, our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.

翻译：对于地物工程,地物选择似乎是一个重要的研究内容,预计将从候选者中选择“优秀”特征。通过地物选择,可以实现不同的功能,例如维度减少、模型效果改进和模型性能改进。随着信息时代的蓬勃发展,大量高维数据日复一日地生成,我们需要花大量精力和时间来标注这些数据。因此,提出了各种算法来处理这类数据,其中未受监督的特征选择引起了巨大的兴趣。在许多分类任务中,研究人员发现如果数据来自同一类别,数据通常会彼此接近;因此,本地紧凑性对于某个特征的评估非常重要。在这个手稿中,我们提出了一个快速、无超强的地物样选择方法,名为“CSUFS”,用于选择想要的特征。为了证明效率和准确性,我们选择了几个数据集,进行密集的实验。后来,我们的方法的有效性和优越性通过处理集群任务得到披露。在这里,几个著名的评价指标显示业绩,因此,对于评估一个特性非常重要。在这个手稿中,效率似乎通过对应的算法来反映效率。

相关内容

特征选择

关注 5933

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

【干货书】开放数据结构，Open Data Structures，337页pdf

专知会员服务

18+阅读 · 2021年9月17日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日