项目名称: 归纳型安全半监督分类学习及其扩展研究
项目编号: No.61300165
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 汪云云
作者单位: 南京邮电大学
项目金额: 23万元
中文摘要: 半监督分类学习是当前备受关注的机器学习任务之一,旨在同时利用标号和无标号样本进行学习,期望获得比仅利用标号样本的监督分类学习更好的分类性能。尽管已取得众多进展,但其中仍存在一个本质问题:半监督分类不安全问题,即半监督分类方法在某些场合可能产生比对应监督方法更差的分类性能,这严重限制了其实际应用。因此,安全半监督分类便成为一项极其重要的研究任务。然而据我们所知,目前相关的研究工作相当有限,且均为直传型学习方法,即通过学习获得给定无标号样本的类标号,无法预测未见样本。但真实分类任务常需对未见样本进行预测,因而需要归纳型学习方法。本项目旨在直接提出一种归纳型安全半监督分类方法,一方面力图填补现有研究的空白,另一方面进一步提升半监督分类方法的安全性和可应用性。整个工作围绕建模、算法设计与实现、理论分析与实验验证等诸方面系统展开。同时尝试扩展该理念至极端半监督分类学习以解决更具挑战性的应用问题。
中文关键词: 半监督分类;安全学习;归纳型学习;特征选择;核学习
英文摘要: Semi-supervised classification learning is one of the machine learning tasks attracted much attention recently, which aims to use both the labeled and unlabeled data so as to achieve better classification performances than supervised classification learning based on the labeled data alone. Though with many achievements, there is still an essential problem in semi-supervised classification: the insecurity of semi-supervised classification, i.e., the semi-supervised classification methods may gain even worse performances than the corresponding supervised ones in some cases, which seriously limits their real applications. As a result, safe semi-supervised classification naturally becomes an extremely important learning task. But as far as we known, there are few related works, and are all transductive, i.e., aiming to obtain only the class labels for the given unlabeled data, thus unable to predict unseen data. While in many real classification tasks, one needs to predict unseen data, thus needs inductive classification methods. The purpose of the program is to directly develop an inductive safe semi-supervised classification method for, on one hand, filling the blank of current research, and on the other hand, further improving the safety and applicability of semi-supervised classification methods. The entire work
英文关键词: Semi-supervised classification;safe learning;inductive learning;feature selection;kernel learning