项目名称: 基于弱监督学习的网络社交用户兴趣识别方法研究
项目编号: No.61303103
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 李岩
作者单位: 深圳职业技术学院
项目金额: 25万元
中文摘要: 网络社交用户兴趣的识别问题已经成为许多应用的重要基础问题。它可以形式化的映射为多类标文本分类问题,其研究难点是:在高噪音、小样本且类标空间分布复杂的条件下,如何使多类标分类算法能自动化构建和优化分类模型,从而使分类器得到良好的分类精度。围绕着这个核心问题,课题将重点研究基于弱监督学习的多类标聚类树分类模型,以及该模型下的关键算法:(1)多类标聚类树学习和模型优化算法,用于解决小样本、高噪音数据的单个分类模型的学习问题;(2)多混合分布条件下类标间依赖性学习算法,用于解决多类标之间复杂依赖性的学习问题;(3)基于多类标聚类树森林的集成学习算法,用于进一步解决高噪音、小样本条件下多个分类模型的融合学习问题。课题的创新在于:基于误差界估计的多类标聚类树构建与模型优化算法;基于内容属性与多类标聚类树拓扑结构信息融合的类标间依赖性学习方法;基于多类标聚类树森林的集成学习算法。
中文关键词: 多类标分类;聚类树;弱监督学习;;
英文摘要: How to identify the interests for each user in social networks is a fundamental problem in many real applications, which can be formulated as a problem of multi-label text classification. The big challenge of this problem lies in high noise in small training data sets, complex dependency and correlation among multiple labels, while the classification performance will highly depend on the effectiveness of mining these complex correlations and dependencies. This project will first propose a multi-label cluster tree classification model based on weak learning strategy, and then explore the following research issues: (1)multi-label cluster tree learning algorithm and its model optimization algorithm, to explore the problem of learning single classifier from small training data set with high noise; (2)learning various dependencies of label set from label space with multiple mixture distributions, to solve the problem of learning complex label dependencies; (3)ensemble learning algorithm based on multi-label cluster forest, to smoothly integrate multiple classifiers. The main innovations of this proposal are as follows: multi-label cluster tree learning algorithm and its model optimization algorithm based on error bound estimation; label dependency learning approach through the combination of content and topology info
英文关键词: multi-label classification;cluster tree;weak-supervised learning;;