项目名称: 多标签分类中的特征提取与选择方法研究
项目编号: No.61273246
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 许建华
作者单位: 南京师范大学
项目金额: 79万元
中文摘要: 多标签分类是样本可以同时属于多个类别(或者标签)、类别间可以相互重叠的模式识别问题,其特殊性主要体现在样本到标签的一对多映射关系和标签间的相关性。本项目利用约束条件、二阶矩、多目标优化等手段来描述这些特殊信息,并将其有效地融入到多标签特征提取与选择方法中。研究多标签样本的加权分派策略,实现基于数据分解手段与线性判别分析的特征提取算法;最小化样本和标签均方投影误差、最大化样本与标签之间的相关性,构造三者线性组合的优化问题来实现特征提取算法;以多标签支持向量机为基线算法,设计与实现嵌入式的特征提取算法。利用多目标进化优化技术同时优化基于标签排序与标签子集的两个性能指标,完成多标签特征选择任务;设计与实现高效的多标签线性支持向量机,建立基于顺序后退方式的特征排序与选择算法。本项目的研究将进一步改善多标签分类算法的性能和计算复杂性、提高模型的可解释性,对发展模式识别理论与应用都具有重要意义。
中文关键词: 多标签分类;特征提取;特征选择;标签降维;蛋白质数据集
英文摘要: Multi-label classification is a special pattern recognition issue, where any instance possibly belongs to multiple classes or labels simultaneously, and thus the classes are overlapped. Its specificity mainly behaves via one-to-many mapping relation from instances to labels, and correlations among labels, which are described using constraints, second moments, multi-objective optimization and so on, and then are integrated into multi-label feature extraction and selection methods in this project. We investigate a weighted assignment strategy for multi-label instances and construct a corresponding feature extraction algorithm based on both data decomposition trick and linear discriminant analysis. Through minimizing squared projection error for instances and labels respectively, and maximizing correlation between instances and labels at tha same time, an optimization problem of linear combination of such three objectives is designed and solved for multi-label feature extraction. An embedded feature extraction approach is presented while multi-label support vector machine is used as a baseline algorithm. Using multi-objective evolution optimization technique, a multi-label feature selection method is constructed through simultaneously optimizing two performance indexes from label ranking-based and labelset-based m
英文关键词: Multi-label classification;feature extraction;feature selection;label dimensionality reduction;protein data sets