项目名称: 基于核、正则化与多目标优化技术的多标签分类算法及其应用研究
项目编号: No.60875001
项目类型: 面上项目
立项/批准年度: 2009
项目学科: 生物科学
项目作者: 许建华
作者单位: 南京师范大学
项目金额: 30万元
中文摘要: 多标签分类是指样本可以同时属于多个类别(或标签)的模式识别问题,其训练样本集包含着样本向量之间的相关信息、样本向量与标签之间的一对多映射关系及标签之间的相关信息。本项目利用核、正则化与多目标优化技术将这些信息融合到新的多标签分类算法。提出带零标签的高效多标签支持向量机,其训练速度在Yeast数据集上比著名的Rank-SVM算法快11倍以上,且不需要额外的线性回归阈值函数。利用样本的标签相关性,提出扩展的一对多多标签支持向量机来改善原始对应算法的性能。基于多目标进化优化算法NSGAII,构建基于双目标优化的多标签分类核算法。在数据集一对一分解后,保留子集中的双标签样本来刻画成对标签之间的相关性,构造双标签支持向量机与最小二乘支持向量机,然后提出两个基于一对一分解的多标签分类算法。综合逐类分解技术与支持向量数据描述算法,建立最快速的多标签核算法。为了反映样本集的样本多样性,提出带自适应权值估计的加权k近邻多标签算法。构造两个生物学应用的新多标签数据集。提供部分软件与数据集的下载,供国内外学者使用。本项目的研究使我们更好地理解和处理多标签问题,进一步拓展了模式识别理论与应用。
中文关键词: 多标签分类; 算法扩展; 数据分解; 支持向量机; 多目标优化
英文摘要: Multi-label classification is a special pattern recognition issue where some instances possibly belong to many classes (or labels) simultaneously. Its training sets involve correlation information among instance vectors and among labels, and one-to-many mapping relation from instance vectors to labels. In this project, several novel multi-label classification approaches are designed and implemented to integrate these kinds of information hidden in training sets using kernel, regularization and multi-objective optimization techniques. An efficient multi-label support vector machine without an additional linear regression threshold function is proposed, whose training procedure is at least 11 times faster than that of famouse Rank-SVM on Yeast data set. On the basis of label correlation of instances, an extended one-versus-rest support vector machine is presented to improve the classification performance of its original form. Based on multi-objective evolution optimization algorithm NSGAII, a double-objective optimization multi-label kernel method is introduced. To characterize pairwise label correlation, the double-label instances are utilized after a training set is divided into several two-class subsets in a pairwise way. Doule-label support vector machine and least squares support vector machine are built, and then two corresponding multi-label classification techniques are constructed. Combining one-by-one data decomposition trick with support vector data description, a fastest multi-label kernel machine is presented. To reflect instance diversity of training sets, we introduce weighted k-nearest neighbor method with an adaptive weight estimation. Two new biological multi-label data sets are constructed. Several software and data sets could be downloaded from our lab homepage, which could be used free for academic purpose. These researches let us understand and handle multi-label classification problems much better now than before, which develops pattern recognition theory and its applications further.
英文关键词: multi-label classification; algorithm extension; data decomposition; support vector machine; multi-objective optimization