项目名称: 面向网络舆论的动态本体学习模型研究
项目编号: No.61003100
项目类型: 青年科学基金项目
立项/批准年度: 2011
项目学科: 金属学与金属工艺
项目作者: 郑海涛
作者单位: 清华大学
项目金额: 7万元
中文摘要: 网络舆论的语义化分析是网络监督、网络监测、网络知识发现、网络行为分析等应用的重要基础。而本体是对领域的概念及其语义关系的一个规范描述,能够很好的表示特定领域的语义信息。本项目通过建立动态本体学习的模型对网络舆论进行语义化分析。本体学习是信息提取的一个分支,目的是从一系列的领域数据集中,自动或者半自动的提取关键概念及其语义关系来构建一个本体。首先,针对网络舆论的时间特征,定义了本体中概念,关系,以及实例的时间属性;其次,建立了关键概念的动态提取模型,通过对支持向量机,人工神经网,贝叶斯学习,归纳学习,以及强化学习等五种机器学习方法的深入分析,采用迁移学习的机制对不同时间点的网络舆论的关键概念进行了提取;然后,基于提取的关键概念,建立了语义关系的计算模型,基于规范信息距离衡量了概念之间的Kolmogorov 复杂性,从而估算了概念之间的语义距离,构建了面向网络舆论的本体;最后,通过大规模真实数据集来验证了该模型合理性和完备性。
中文关键词: 网络舆论;本体学习;迁移学习;规范信息距离
英文摘要: Semantic analysis of public opinion is an important cornerstone for network monitoring, web knowledge discovery, and web behavior analysis. Ontology is a formal description for concepts and their semantic relations in a specific domain. This project aims at semantic analyzing public opinion by constructing a dynamic ontology learning model. Ontology learning is a branch of information extraction, which focuses on extracting key concepts and their semantic relations to construct an ontology automatically or semi-automatically. First, we define the time attribute for concepts, relations, and instances in the ontology based on the feature of public opinion; Second, we construct a dynamic keyword extraction model. We analyze five machine learning algorithms, i.e., support vector machine, artificial neural network, Bayesian learning, inductive learning, and reinforcement learning. Transfer learning is utilized to extract key concepts at different time points of public opinion; Third, we construct the model to compute the semantic relations between extracted key concepts. The model uses normalized information distance to calculate the Kolmogorov complexity between concepts. Finally, we use a large-scale real dataset to evaluate the completeness and the soundness of the proposed model.
英文关键词: Public Opinion; Ontology Learning; Transfer learning; Normalized Information Distance