For feature engineering, feature selection seems to be an important research content in which is anticipated to select "excellent" features from candidate ones. Different functions can be realized through feature selection, such as dimensionality reduction, model effect improvement, and model performance improvement. Along with the flourish of the information age, huge amounts of high-dimensional data are generated day by day, while we need to spare great efforts and time to label such data. Therefore, various algorithms are proposed to address such data, among which unsupervised feature selection has attracted tremendous interests. In many classification tasks, researchers found that data seem to be usually close to each other if they are from the same class; thus, local compactness is of great importance for the evaluation of a feature. In this manuscript, we propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS), to select desired features. To demonstrate the efficiency and accuracy, several data sets are chosen with intensive experiments being performed. Later, the effectiveness and superiority of our method are revealed through addressing clustering tasks. Here, the performance is indicated by several well-known evaluation metrics, while the efficiency is reflected by the corresponding running time. As revealed by the simulation results, our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
翻译:对于地物工程,地物选择似乎是一个重要的研究内容,预计将从候选者中选择“优秀”特征。通过地物选择,可以实现不同的功能,例如维度减少、模型效果改进和模型性能改进。随着信息时代的蓬勃发展,大量高维数据日复一日地生成,我们需要花大量精力和时间来标注这些数据。因此,提出了各种算法来处理这类数据,其中未受监督的特征选择引起了巨大的兴趣。在许多分类任务中,研究人员发现如果数据来自同一类别,数据通常会彼此接近;因此,本地紧凑性对于某个特征的评估非常重要。在这个手稿中,我们提出了一个快速、无超强的地物样选择方法,名为“CSUFS”,用于选择想要的特征。为了证明效率和准确性,我们选择了几个数据集,进行密集的实验。后来,我们的方法的有效性和优越性通过处理集群任务得到披露。在这里,几个著名的评价指标显示业绩,因此,对于评估一个特性非常重要。在这个手稿中,效率似乎通过对应的算法来反映效率。