In this paper, a new feature selection algorithm, called SFE (Simple, Fast, and Efficient), is proposed for high-dimensional datasets. The SFE algorithm performs its search process using a search agent and two operators: non-selection and selection. It comprises two phases: exploration and exploitation. In the exploration phase, the non-selection operator performs a global search in the entire problem search space for the irrelevant, redundant, trivial, and noisy features, and changes the status of the features from selected mode to non-selected mode. In the exploitation phase, the selection operator searches the problem search space for the features with a high impact on the classification results, and changes the status of the features from non-selected mode to selected mode. The proposed SFE is successful in feature selection from high-dimensional datasets. However, after reducing the dimensionality of a dataset, its performance cannot be increased significantly. In these situations, an evolutionary computational method could be used to find a more efficient subset of features in the new and reduced search space. To overcome this issue, this paper proposes a hybrid algorithm, SFE-PSO (particle swarm optimization) to find an optimal feature subset. The efficiency and effectiveness of the SFE and the SFE-PSO for feature selection are compared on 40 high-dimensional datasets. Their performances were compared with six recently proposed feature selection algorithms. The results obtained indicate that the two proposed algorithms significantly outperform the other algorithms, and can be used as efficient and effective algorithms in selecting features from high-dimensional datasets.
翻译:本文提出了一种新的特征选择算法 SFE(Simple, Fast, and Efficient),适用于高维度数据集。SFE 算法使用一个搜索代理和两个操作符进行搜索处理:非选和选。它由探索和开发两个阶段组成。在探索阶段,非选操作符在整个问题搜索空间中执行全局搜索,查找无关、冗余、琐碎和噪声特征,并将这些特征的状态更改为非选状态。在开发阶段,选择操作符在问题搜索空间中搜索对分类结果影响较大的特征,并将这些特征的状态更改为选择状态。SFE 算法成功地从高维度数据集中选择了特征。但是,在将数据集的维度降低后,其表现并不能显著提高。在这种情况下,可以使用进化计算方法在新的和降维后的搜索空间中找到一个更高效的特征子集。为了解决这个问题,本文提出了一种混合算法 SFE-PSO(PSO 代表粒子群优化),用于找到最优的特征子集。对 40 个高维度数据集进行了 SFE 和 SFE-PSO 算法的效率和效果比较。它们的性能与最近提出的六种特征选择算法进行比较,结果表明,两种提出的算法显著优于其他算法,并可用作从高维度数据集中选择特征的有效算法。