通过正常频率选择特征的简明方法 (A concise method for feature selection via normalized frequencies)

Feature selection is an important part of building a machine learning model. By eliminating redundant or misleading features from data, the machine learning model can achieve better performance while reducing the demand on com-puting resources. Metaheuristic algorithms are mostly used to implement feature selection such as swarm intelligence algorithms and evolutionary algorithms. However, they suffer from the disadvantage of relative complexity and slowness. In this paper, a concise method is proposed for universal feature selection. The proposed method uses a fusion of the filter method and the wrapper method, rather than a combination of them. In the method, one-hoting encoding is used to preprocess the dataset, and random forest is utilized as the classifier. The proposed method uses normalized frequencies to assign a value to each feature, which will be used to find the optimal feature subset. Furthermore, we propose a novel approach to exploit the outputs of mutual information, which allows for a better starting point for the experiments. Two real-world dataset in the field of intrusion detection were used to evaluate the proposed method. The evaluation results show that the proposed method outperformed several state-of-the-art related works in terms of accuracy, precision, recall, F-score and AUC.

翻译：功能选择是建立机器学习模型的一个重要部分。通过消除数据中的冗余或误导特性,机器学习模型可以取得更好的性能,同时减少对电算资源的需求。元元数算法主要用于执行特征选择, 如群情智能算法和进化算法等。但是,它们受到相对复杂和缓慢的不利因素的影响。在本文中, 提出了一个用于通用特征选择的简明方法。拟议的方法使用了过滤法和包装法的结合, 而不是两者的结合。在方法中, 使用一个加热编码来预处理数据集, 随机森林作为分类器使用。提议的方法使用正常频率来给每个特性指定一个值, 用于找到最佳的特性子集。此外, 我们提出一个新的方法来利用共同信息的产出, 从而可以更好地开始实验。在入侵探测领域使用两个真实的数据集来评价拟议的方法。评价结果显示, 拟议的方法在精确性、精确性、重力、重力、重力、重力、联合性、重力方面, 。

相关内容

特征选择

关注 5931

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

【DeepMind】基于模型的强化学习，174页ppt，Model-Based Reinforcement Learning

专知会员服务

89+阅读 · 2021年1月12日

【KDD2020】图神经网络的无冗余计算

专知会员服务

38+阅读 · 2020年11月24日

随机特征核近似综述: 算法与理论，Random Features for Kernel Approximation: A Survey in Algorithms, Theory, and Beyond

专知会员服务

33+阅读 · 2020年4月26日

【微软-Victor Bahl】边缘计算，49页ppt，Edge Computing for Infrastructure

专知会员服务

56+阅读 · 2020年4月13日