快速和强力特点选择:对自动编码员进行节能的微小培训的实力 (Quick and Robust Feature Selection: the Strength of Energy-efficient Sparse Training for Autoencoders)

Major complications arise from the recent increase in the amount of high-dimensional data, including high computational costs and memory requirements. Feature selection, which identifies the most relevant and informative attributes of a dataset, has been introduced as a solution to this problem. Most of the existing feature selection methods are computationally inefficient; inefficient algorithms lead to high energy consumption, which is not desirable for devices with limited computational and energy resources. In this paper, a novel and flexible method for unsupervised feature selection is proposed. This method, named QuickSelection, introduces the strength of the neuron in sparse neural networks as a criterion to measure the feature importance. This criterion, blended with sparsely connected denoising autoencoders trained with the sparse evolutionary training procedure, derives the importance of all input features simultaneously. We implement QuickSelection in a purely sparse manner as opposed to the typical approach of using a binary mask over connections to simulate sparsity. It results in a considerable speed increase and memory reduction. When tested on several benchmark datasets, including five low-dimensional and three high-dimensional datasets, the proposed method is able to achieve the best trade-off of classification and clustering accuracy, running time, and maximum memory usage, among widely used approaches for feature selection. Besides, our proposed method requires the least amount of energy among the state-of-the-art autoencoder-based feature selection methods.

翻译：由于最近高维数据数量的增加,包括高计算成本和记忆要求的增加,产生了重大的复杂问题。特征选择确定了数据集中最相关和最丰富的属性,作为解决这个问题的一种解决办法。现有的特征选择方法大多在计算上效率低;低效率算法导致高能源消耗,而对于计算和能源资源有限的装置来说,这种消耗并不可取。本文提出了一种不受监督特征选择的新颖和灵活的方法。这个方法名为QuickSebet,在稀疏的神经网络中引入神经元的强度,作为衡量特征重要性的标准。这一标准与与经过稀疏的进化培训程序培训的稀疏连通的脱色自动编码器混合,同时体现了所有输入特性的重要性。我们以纯稀疏的方式实施快速选择,而不是典型的用二进制面遮掩蔽器模拟突扰的连接。它导致速度大幅提高和记忆减少。在几个基准数据集(包括5个低维和3个高维的数据集)进行测试时,拟议方法能够实现最大程度的内径选择方法,从而实现最佳程度的内置率。