项目名称: 基于迁移学习的Web挖掘研究
项目编号: No.60873211
项目类型: 面上项目
立项/批准年度: 2009
项目学科: 矿业工程
项目作者: 薛贵荣
作者单位: 上海交通大学
项目金额: 26万元
中文摘要: 传统的Web挖掘算法需要标定大量训练数据,这将会耗费大量的人力与物力。而没有大量的标注数据,会使得很多与学习相关研究与应用无法开展。针对Web 挖掘中训练数据获取难、训练数据过期以及大量富余数据无法充分利用等问题与挑战,本课题研究了迁移学习的基本原理,提出了谱迁移、翻译学习和结构化迁移学习等迁移学习理论研究,并针对Web 环境中排序、多语言学习、跨媒体学习以及分类等Web 挖掘研究中的关键问题,研究了新的迁移算法来解决上述Web 挖掘中的问题与挑战。本课题的研究推进了大范围Web 挖掘研究应用,减少Web 上数据标定人力与财力上的消耗,提高Web 挖掘研究的性能等方面都有非常重要的意义。进一步,机器学习的适应能力也达到了到一个新的层面,拓宽机器学习算法的普适化程度。
中文关键词: 迁移学习;翻译学习;异构迁移学习;Web挖掘;跨媒体学习
英文摘要: The traditional Web mining needs a lot of training data, as a result, it will cost a significant amount of manpower and material resources. Without such labeling data, the traditional learning tasks cannot be carried out. According to the challenges such as difficulty in acquiring training data, out-of-date of training data and wasting of the old training data the issues, in this research project, we developed a transfer learning framework, which utilizes the related but different domains of knowledge, to solve above Web mining issues and challenges. Three algorithms on transfer learning are developed, including eigentransfer, translated learning and stractured transfer learning. Then, we focused on the key research topics including transfer classification on the Web, multilingual transfer learning, cross-media transfer learning and transfer learning to rank. The research significantly helped to promote more Web mining applications, to reduce human efforts and financial resources consumption, and to improve the performance of Web Mining Research. Furthermore, transfer learning also promoted machine learning to a new application level, to broaden the application of machine learning algorithm capacity.
英文关键词: Transfer Learning;Translated Learning; Heterogenous Transfer Learning; Web Mining; Cross-Media Learning