项目名称: 基于高斯过程模型的多示例多标记学习算法研究
项目编号: No.61503058
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 其他
项目作者: 贺建军
作者单位: 大连民族大学
项目金额: 22万元
中文摘要: 多示例多标记学习是近年来提出的一种处理多义性数据的新机器学习框架,由于它为挖掘样本与其类别标记间的驱动关系提供了可行性,正受到越来越多的关注。高斯过程模型是一种核方法,具有易实现、可自适应地挖掘关系信息等优点。本项目旨在基于高斯过程模型建立一种面向大规模未完全标注多义性数据的多示例多标记学习算法,拟先通过设计一种新结构的高斯过程模型,解决同时挖掘示例与标记间关系和标记与标记间关系这两种重要信息的问题;然后基于随机变分推理法建立一种复杂度较低的模型求解方法,解决处理大规模训练数据的问题;最后借助PU学习技术的思想建立一种两阶段策略,解决有效利用未完全标注数据的问题,从而达到最终目的。本项目利用高斯过程模型不仅解决了同时挖掘示例与标记间关系和标记与标记间关系这个算法构建的核心问题,还解决了核方法复杂度过高不宜处理大规模数据的问题,可有效推动多示例多标记学习技术在大数据中的应用。
中文关键词: 弱标记学习;多示例学习;多标记学习
英文摘要: Multi-instance multi-label learning is a machine learning framework proposed recently for solving the problem of multi-semantic data. Because it can provide a possibility for explaining why a concerned sample has the certain class labels, multi-instance multi-label learning framework is attracting more and more attention. Gaussian process model is a kernel method that has many merits such as being implemented easily, adaptively discovering the relationship among variables. This project aims at developing a novel multi-instance multi-label learning algorithm based on Gaussian process model for solving the problem of large-scale incompletely annotated multi-semantic data. It includes research to solve the problem of simultaneously describing the relationship between instances and labels as well as the relationship among labels by designing a new Gaussian process model, to solve the large-scale training data problem by proposing an solving approach with lower computational cost for Gaussian process model based on stochastic variational inference, to solve the incompletely annotated data problem by developing a two-step strategy based on ideas of positive and unlabeled learning. Based on Gaussian process model, we not only develop a model that can simultaneously describe the relationship between instances and labels as well as the relationship among labels, which is a key problem for developing multi-instance multi-label learning algorithm, but also solve the problem that kernel methods is difficult to process large-scale training data. This project will promote the application of multi-instance multi-label learning in big data.
英文关键词: Weak label learning;Multi-instance learning;Multi-label learning