项目名称: 结合知识图谱的概率话题模型研究
项目编号: No.61502066
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 其他
项目作者: 李智星
作者单位: 重庆邮电大学
项目金额: 21万元
中文摘要: “理解互联网”是人工智能的远景目标之一,其中一种尝试是基于已有知识对互联网上的非结构化信息进行分析和理解,从数据中学习知识。现有概率话题模型可以对非结构化数据进行分析,但无法有效利用知识图谱中的结构化知识。同时,知识抽取常常面临目标定位困难和话题漂移等问题。针对这些挑战,本项目拟进行如下研究。首先,提出一种新的概率话题模型:全局话题随机场。该模型利用知识图谱中的知识将非结构化的文档表示为图,采用我们提出的全局随机场对文档图进行话题采样,提高生成的话题的质量。其次,基于模型的生成的话题,可以摆脱词汇的限制,从语义层面对文档进行分析,提高开放领域知识抽取过程中目标定位的精度,改善话题漂移现象。基于这两点的研究,尝试设计一个“Life-Long Learning”的知识抽取原型系统,通过百科等建立初步的知识集合,利用全局话题随机场对互联网文档进行分析,进而抽取新的知识,实现持续的学习。
中文关键词: 话题模型;知识图谱;概率图模型;非监督学习;知识抽取
英文摘要: “Understanding the Internet” is one of the goal of artificial intelligence. One attempt is to analyze and understand the unstructured information on Internet based on existing knowledge. Probabilistic topic model is widely used in the processing of unstructured data but cannot make use of structured knowledge in knowledge graph. Meanwhile, sentence targeting and topic drift are two problems needed to be solved in knowledge extraction research. This research aims at these two challenges. First, a novel probabilistic topic model called Global Topic Random Fields is proposed. It transforms text documents into graphs with the knowledge in Knowledge graphs and then sampling topics of words using Global Random Fields. Second, based on the modelled topics, we analyze the semantic of text documents to get rid of the limits of the vocabulary. It will increase the precision of target locating and avoid the topic drifting in knowledge extraction tasks. Based on these techniques, we try to design a “Life-long learning” knowledge extraction prototype. We will gather initial knowledge set from Wiki sites and extract new knowledge from Internet with the help of Global Topic Random Fields to achieve Life-Long Learning.
英文关键词: Topic Model;Knowledge Graph;Probabilistic Graphic Model;Unsupervised Learning;Knowledge Extraction