基于分类能力结构度量与类相关性关系保留的特征选取方法研究

项目名称： 基于分类能力结构度量与类相关性关系保留的特征选取方法研究

项目编号： No.61772288

项目类型： 面上项目

立项/批准年度： 2018

项目学科： 其他

项目作者： 卫金茂

作者单位： 南开大学

项目金额： 16万元

中文摘要： 特征选取是模式识别、机器学习与数据挖掘中的基础研究问题。现有特征选取方法的共性是，将特征与类均看作变量，并以某一标量来度量特征的类区分能力。对于多类、多标签等复杂分类问题，首先，仅以一个标量值区分特征的分类能力，无法体现特征对分类问题所涉及的不同方面的贡献。另外，在复杂分类问题中，各类间除了对立关系，还可能存在不同程度的相容等复杂相关性关系。显然，现有的以将不同类区分开为目的的特征选取方法无法有效考虑并区别处理这种关系。基于以上考虑，本项目拟对基于分类能力结构度量与类相关性关系保留的特征选取方法进行研究，主要研究内容包括：研究基于局部学习等理论的特征分类能力度量方法；研究类间相关性保留的特征选取方法，并在公开机器学习数据上验证方法的有效性。通过研究，对复杂分类问题的特征选取方法进行初步探索。

中文关键词： 特征选择；分类模型；分类问题；分类算法；机器学习与数据挖掘

英文摘要： Feature selection is a fundamental research issue in pattern recognition, machine learning and data mining. The commonness of existing feature selection methods is that, both features and class are taken as varibles and a scalar value is computed to indicate the classification ability of a feature. For a complicated problem, such as multi-class, multi-label classification problem, a scalar value can hardly reveal the multi-faceted contributions of a feature for the different aspects of the problem. In addition, different classes incline to differently correlated with each other in a complicated classification problem, which is far from the simple contrary relation. Such complicated relations can hardly be evaluated effectively and treated differently by traditional feature selection criteria, which is mainly aimed at separating different classes apart. In view of such issues, this project intends to study how to select features based on discrimination structure measurement and class correlation preservation. The work mainly involves: measurement of classification ability of a feature based on local learning techniques, feature selection based on class correlation preservation, and experimental verification of the proposed approaches on public machine learning data. The work aims mainly at making a preliminary research on feature selection methodology for complicated classification problems.

英文关键词： feature selection;classification model;classification problem;classification algorithm;machine learning and data mining

成为VIP会员查看完整内容

相关内容

特征选择

关注 5931

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

【ICLR2022】基于任务相关性的元学习泛化边界

专知会员服务

19+阅读 · 2022年2月8日

NeurIPS 2021 | 寻找用于变分布泛化的隐式因果因子

专知会员服务

17+阅读 · 2021年12月7日

【博士论文】开放环境下的度量学习研究

专知会员服务

49+阅读 · 2021年12月4日

【ICML2021】从DNN中解释和解分不同复杂度的特征分量

专知会员服务

25+阅读 · 2021年7月22日