项目名称: 未标记数据流中的迁移学习关键问题研究
项目编号: No.61305063
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 张玉红
作者单位: 合肥工业大学
项目金额: 23万元
中文摘要: 实际应用中标记信息的难以获取使得未标记数据流的研究成为热点。目前已有研究将半监督方法用于不完全标记数据流,然而这种方法基于标记数据与未标记数据独立同分布的假设,这在实际应用中难以满足。为此本课题将迁移学习引入未标记数据流中,围绕其中的关键问题展开研究。首先对迁移学习理论和方法体系在数据流环境下的适应性问题开展研究,探讨适应流环境的实时、快速的迁移主体和迁移桥梁的模型表示和设计方法;基于实例、特征、模型等数据形态,研究如何有效的将标记数据迁移到未标记数据的学习过程中,重点研究标记信息的传播和扩散机制;此外,针对未标记数据流中的概念漂移问题,开展有效的概念漂移检测方法和相应的分类器适应机制,最终形成不受独立同分布条件限制的,未标记数据流的有效知识迁移体系和方法。在上述工作基础上,以web评论数据流为应用背景,构建未标记数据流的分类原型系统。
中文关键词: 概念漂移;迁移学习;特征提取;词语相似度;
英文摘要: Learning from unlabeled data stream is a hot topic, this is because it is difficult to obtain the labels of data streams in the real-world applications. Recently, semi-supervised learning has been used to handle unlabeled data streams. However, these approaches are built on the assumption that both of the labeled and unlabeled distributions are independent and identical. It is obviously not supported in the real-world applications. Thus, transfer learning, which aims to learn from the unlabeled data with the help of some labeled data, is proposed to tackle the unlabled data streams. In this proposal, we focus on the key issues of transfer learning on unlabled data streams. More specifically, we first study the adaptation of transfer learning theory and method in data streams, and explore the model of representation and design on issues of the transfer subjects and the transfer bridges, which are real-time in view of the streaming environment. Secondly, we study the effective transfer learning methods for unlabeled data with the help of labeled data regarding the instance,feature, model, etc. Meanwhile we will focuse on the method and technique of the label propagation. In addition, regarding the concept drifts in data streams, we study effective methods of concept drifting detection and the adaptation mechanism
英文关键词: Concept Drift;Transfer Learning;Feature Extraction;Similarity of Words;