以分类中缺失数据进行多目标特征选择 (Multi-objective Feature Selection with Missing Data in Classification)

Feature selection (FS) is an important research topic in machine learning. Usually, FS is modelled as a+ bi-objective optimization problem whose objectives are: 1) classification accuracy; 2) number of features. One of the main issues in real-world applications is missing data. Databases with missing data are likely to be unreliable. Thus, FS performed on a data set missing some data is also unreliable. In order to directly control this issue plaguing the field, we propose in this study a novel modelling of FS: we include reliability as the third objective of the problem. In order to address the modified problem, we propose the application of the non-dominated sorting genetic algorithm-III (NSGA-III). We selected six incomplete data sets from the University of California Irvine (UCI) machine learning repository. We used the mean imputation method to deal with the missing data. In the experiments, k-nearest neighbors (K-NN) is used as the classifier to evaluate the feature subsets. Experimental results show that the proposed three-objective model coupled with NSGA-III efficiently addresses the FS problem for the six data sets included in this study.

翻译：功能选择( FS) 是机器学习的一个重要研究课题。通常, FS 是一个+ 双目标优化问题, 其目标为:(1) 分类准确性;(2) 特征数目。真实世界应用中的主要问题之一是缺少数据。缺少数据的数据库可能不可靠。因此, 在缺少某些数据的数据集中进行的 FS 也是不可靠的。为了直接控制这一问题, 我们在本研究中建议对 FS 进行新的模型分析: 我们把可靠性作为问题的第三个目标。为了解决修改的问题, 我们建议应用非主的基因算法III(NSGA- III) 。我们从加利福尼亚大学Irvine(UCI)机器学习库中选择了六个不完整的数据集。我们用平均的浸渍方法处理缺失的数据。在实验中, K- 最近邻( K- NNN) 被用作分类员来评估地段。实验结果表明, 与 NGA- III 一起提出的三个目标模型与 NGA- III 有效解决了本研究中包含的六个数据集中的FS 。

相关内容

特征选择

关注 5933

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ICML2020】深度神经网络置信感知学习，Conﬁdence-Aware Learning for Deep Neural Networks

专知会员服务

74+阅读 · 2020年7月6日