Feature selection is an important part of building a machine learning model. By eliminating redundant or misleading features from data, the machine learning model can achieve better performance while reducing the demand on com-puting resources. Metaheuristic algorithms are mostly used to implement feature selection such as swarm intelligence algorithms and evolutionary algorithms. However, they suffer from the disadvantage of relative complexity and slowness. In this paper, a concise method is proposed for universal feature selection. The proposed method uses a fusion of the filter method and the wrapper method, rather than a combination of them. In the method, one-hoting encoding is used to preprocess the dataset, and random forest is utilized as the classifier. The proposed method uses normalized frequencies to assign a value to each feature, which will be used to find the optimal feature subset. Furthermore, we propose a novel approach to exploit the outputs of mutual information, which allows for a better starting point for the experiments. Two real-world dataset in the field of intrusion detection were used to evaluate the proposed method. The evaluation results show that the proposed method outperformed several state-of-the-art related works in terms of accuracy, precision, recall, F-score and AUC.
翻译:功能选择是建立机器学习模型的一个重要部分。 通过消除数据中的冗余或误导特性,机器学习模型可以取得更好的性能,同时减少对电算资源的需求。 元元数算法主要用于执行特征选择, 如群情智能算法和进化算法等。 但是,它们受到相对复杂和缓慢的不利因素的影响。 在本文中, 提出了一个用于通用特征选择的简明方法。 拟议的方法使用了过滤法和包装法的结合, 而不是两者的结合。 在方法中, 使用一个加热编码来预处理数据集, 随机森林作为分类器使用。 提议的方法使用正常频率来给每个特性指定一个值, 用于找到最佳的特性子集。 此外, 我们提出一个新的方法来利用共同信息的产出, 从而可以更好地开始实验。 在入侵探测领域使用两个真实的数据集来评价拟议的方法。 评价结果显示, 拟议的方法在精确性、 精确性、 重力、 重力、 重力、 重力、 联合性、 重力方面, 。