基于概率语义分析的多关系图多类标分类方法研究

项目名称： 基于概率语义分析的多关系图多类标分类方法研究

项目编号： No.61502177

项目类型： 青年科学基金项目

立项/批准年度： 2016

项目学科： 自动化技术、计算机技术

项目作者： 吴庆耀

作者单位： 华南理工大学

项目金额： 21万元

中文摘要： 随着数字社会化新媒体和移动互联网应用的普及，如何识别不同社会网络中个体的兴趣爱好成为数据挖掘领域广泛关注的问题。该问题可以转换为多关系图中节点的多类标分类问题，其关键研究难点是：多关系图通常只有少量学习样本，而且节点之间、类标之间存在复杂关联关系，必须结合有类标和无类标节点信息学习节点之间、类标之间的关联性，从而构建高性能的分类模型。本课题围绕这个问题，研究基于概率语义分析方法的多关系图多类标分类模型，主要内容包括：(1)基于元路径的多关系图表示模型；（2）基于节点之间相关性的多关系图半监督学习模型；(3) 基于类标之间相关性的多类标半监督学习模型；(4) 面向多关系图的多类标分类算法设计。课题的创新在于：基于元路径的表示模型，基于概率语义分析方法的半监督学习方法；基于节点之间+类标之间相关性的多关系图的多类标分类方法。

中文关键词： 多关系网络；多类标分类；概率语义分析；半监督学习

英文摘要： With the prevalent of social media and mobile Internet applications, the problem of multi-label classification of the nodes in the heterogeneous information networks which consist of multiple types of relationships between nodes has became one of the most important research issues in the domain of data mining and social network analysis. The key difficulties of this issue lie in: there is only limited number of labeled nodes in the networks, and thus it is crucical to use both labeled and unlabeled data together to exploit the correlations of nodes and labels simultaneous to enhance the classification performance. To tackle this issue, we proposed the following research contents: (1) meta-path based heterogeneous information networks representation; (2) semi-supervised learning in heterogeneous information networks by analyzing node correlations; (3) semi-supervised learning of multi-label data by analyzing label dependencies; (4) the design of algorithm for multi-label classification in the heterogeneous information networks. The novelty of this proposal are: meta-path based representation model; probabilistic latent semantic analysis based semi-supervised learning algorihtm; multi-label classification of heterogeneous information networks by analyzing node correlations and label dependencies.

英文关键词： Heterogeneous Information Networks;Multi-label Classification;Probabilistic Latent Semantic Analysis;Semi-supervised Learning

成为VIP会员查看完整内容