项目名称: 面向大数据的安全迁移学习方法
项目编号: No.61502265
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 其他
项目作者: 龙明盛
作者单位: 清华大学
项目金额: 22万元
中文摘要: 随着互联网的快速发展,信息网络中产生了大量无标记或弱标记数据,这给基于标记数据的有监督机器学习方法带来了新的挑战。与此同时,维基百科等语义网络中维护了丰富的标记数据,如何迁移和复用这些标记数据是实现弱监督机器学习的关键。迁移学习是一种重要的弱监督机器学习技术,其目标是在异构领域间挖掘不变特征结构和无偏识别模型,提高机器学习的跨领域泛化性能。近年来各种迁移学习理论和方法发展非常迅速,但在模型安全性和算法可扩展性方面仍存在瓶颈,因而尚不能很好满足大规模跨领域数据的分析需求。本项目拟研究面向大数据的安全迁移学习方法,重点突破模型安全性和算法可扩展性瓶颈,主要研究内容有:多核分布差异度量方法、低偏差方差分布校正方法、深度网络迁移学习方法、迁移哈希学习方法,以及这些方法的可扩展优化算法和分布式系统实现。本项目的研究将有助于推动迁移学习技术的成熟和完善,为非平稳环境下大数据分析挖掘提供坚实的技术支撑。
中文关键词: 迁移学习;监督学习;核方法;深度学习;大规模机器学习
英文摘要: With the rapid development of Internet, large-scale unlabeled or weak-labeled data are generated in information networks, which pose a new challenge to supervised machine learning from labeled data. In the meantime, large-scale rich-labeled data are maintained in semantic networks such as Wikipedia, while how to transfer and reuse these labeled data is the key approach to weak-supervised machine learning. Transfer learning is an important weak-supervised machine learning technology, whose goal is to learn invariant feature structures and unbiased recognition models and hence boost the cross-domain generalization performance of machine learning. In recent years, a fruitful stream of transfer learning theories and methods are emerging rapidly, however, subject to the bottlenecks of model security and algorithm scalability, existing transfer learning techniques have not satisfied the requirements of large-scale cross-domain data analytics. In this research, we plan to study secure transfer learning methods for big data, making breakthroughs to the bottlenecks of model security and algorithm scalability. The main research contents include: multiple-kernel distribution discrepancy measurement, low bias-variance distribution shift correction, deep neural network transfer learning, transfer learning to hash, scalable optimization algorithms and distributed system implementations. This research will contribute to promote the maturity and completeness of transfer learning technology, and lay solid foundations for big data analytics under non-stationary environment.
英文关键词: Transfer Learning;Supervised Learning;Kernel Method;Deep Learning;Large-Scale Machine Learning