Feature selection reduces the dimensionality of data by identifying a subset of the most informative features. In this paper, we propose an innovative framework for unsupervised feature selection, called fractal autoencoders (FAE). It trains a neural network to pinpoint informative features for global exploring of representability and for local excavating of diversity. Architecturally, FAE extends autoencoders by adding a one-to-one scoring layer and a small sub-neural network for feature selection in an unsupervised fashion. With such a concise architecture, FAE achieves state-of-the-art performances; extensive experimental results on fourteen datasets, including very high-dimensional data, have demonstrated the superiority of FAE over existing contemporary methods for unsupervised feature selection. In particular, FAE exhibits substantial advantages on gene expression data exploration, reducing measurement cost by about $15$\% over the widely used L1000 landmark genes. Further, we show that the FAE framework is easily extensible with an application.
翻译:在本文中,我们提出一个不受监督的特征选择创新框架,称为分形自动读数器(FAE),它训练神经网络,为全球探索代表性和地方多样性挖掘确定信息特征。从结构上讲,FAE通过增加一个一对一的评分层和一个小型次神经网络,以不受监督的方式选择特征,从而降低了数据的维度。在这样一个简洁的结构下,FAE实现了最新性能;14个数据集的广泛实验结果,包括非常高的维度数据,显示了FAE优于目前未受监督的特征选择方法。特别是,FAE在基因表达数据探索方面表现出巨大的优势,将广泛使用的L1000标志性基因的测量成本降低约15,000美元。此外,我们表明FAE框架在应用上很容易推广。