项目名称: 基于分类能力结构度量与类相关性关系保留的特征选取方法研究
项目编号: No.61772288
项目类型: 面上项目
立项/批准年度: 2018
项目学科: 其他
项目作者: 卫金茂
作者单位: 南开大学
项目金额: 16万元
中文摘要: 特征选取是模式识别、机器学习与数据挖掘中的基础研究问题。现有特征选取方法的共性是,将特征与类均看作变量,并以某一标量来度量特征的类区分能力。对于多类、多标签等复杂分类问题,首先,仅以一个标量值区分特征的分类能力,无法体现特征对分类问题所涉及的不同方面的贡献。另外,在复杂分类问题中,各类间除了对立关系,还可能存在不同程度的相容等复杂相关性关系。显然,现有的以将不同类区分开为目的的特征选取方法无法有效考虑并区别处理这种关系。基于以上考虑,本项目拟对基于分类能力结构度量与类相关性关系保留的特征选取方法进行研究,主要研究内容包括:研究基于局部学习等理论的特征分类能力度量方法;研究类间相关性保留的特征选取方法,并在公开机器学习数据上验证方法的有效性。通过研究,对复杂分类问题的特征选取方法进行初步探索。
中文关键词: 特征选择;分类模型;分类问题;分类算法;机器学习与数据挖掘
英文摘要: Feature selection is a fundamental research issue in pattern recognition, machine learning and data mining. The commonness of existing feature selection methods is that, both features and class are taken as varibles and a scalar value is computed to indicate the classification ability of a feature. For a complicated problem, such as multi-class, multi-label classification problem, a scalar value can hardly reveal the multi-faceted contributions of a feature for the different aspects of the problem. In addition, different classes incline to differently correlated with each other in a complicated classification problem, which is far from the simple contrary relation. Such complicated relations can hardly be evaluated effectively and treated differently by traditional feature selection criteria, which is mainly aimed at separating different classes apart. In view of such issues, this project intends to study how to select features based on discrimination structure measurement and class correlation preservation. The work mainly involves: measurement of classification ability of a feature based on local learning techniques, feature selection based on class correlation preservation, and experimental verification of the proposed approaches on public machine learning data. The work aims mainly at making a preliminary research on feature selection methodology for complicated classification problems.
英文关键词: feature selection;classification model;classification problem;classification algorithm;machine learning and data mining