项目名称: 中文动态语义网构建技术研究
项目编号: No.61272344
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 赵东岩
作者单位: 北京大学
项目金额: 80万元
中文摘要: 语义网(Semantic Web)是W3C组织提出一项将现有的Web信息结构化的运动。和传统的Web相比,语义网可以更好地支持语义检索,提供更准确的查询结果,因而构建语义网成为了目前计算机领域的研究热点。构建语义网的一个重要途径是通过信息抽取技术从非结构化文档中抽取语义知识,并构建语义关系网。目前的语义网构建的研究,往往忽略了所抽取的语义知识的时效性,以及所抽取语义知识的不确定性特点。因此,本课题拟提出中文动态语义网的构建技术,利用信息抽取技术,从中文百科类网站以及中文新闻页面中抽取语义数据。具体的,利用中文百科类网站 抽取中文实体的基本属性信息;利用新闻类网页数据,抽取实时的新闻语义要素5W1H。同时 对这两类语义数据进行语义集成,从而形成时效性高的中文动态语义网。另外考虑到,利用信息抽取技术所获得的语义数据的不确定性特点,设计基于不确定性语义网的检索算法,从而提高语义检索的准确性。
中文关键词: 中文知识图谱构建;信息抽取;实体关系抽取;图数据管理;子图查询
英文摘要: The Semantic Web is a collaborative movement led by the World Wide Web Consortium (W3C) that promotes common formats for data on the World Wide Web. Compared with traditional WWW, Semantic Web can support semantic search and provide more precise search results. Therefore, Semantic Web has attracted lots of attentions in the research area. An effective way to build semantic web is to utilize information extraction technique to obtain semantic knowledge from unstructured Web pages. However, recent research ignores two important features in semantic web building, that are "Timeliness" and "Uncertainty". Furthermore, existing research pays more attention to extract knowledge facts from English resources, and there is less work concerning Chinese. In this project, we propose to building Chinese Semantic Web from online Chinese encyclopedias and Chinese News Website. Specifically, we extract entities and their basic property values from online Chinese encyclopedias, and extract dynamic knowledge (e.g., 5W1H in news) from Chinese News Website. We also integrate the knowledge from these two distinct sources. During the building processes, we consider the "timeliness" and "uncertainty". Finally, we propose uncertain semantic search algorithms over the semantic web to improve the query quality.
英文关键词: Chinese Knowledge Graph Construction;Information Extraction;Entity Relation Extraction;Graph Data Management;Subgraph Query