项目名称: 本体导向的大规模语义信息声明式抽取方法研究
项目编号: No.61272110
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 李旭晖
作者单位: 武汉大学
项目金额: 80万元
中文摘要: 大规模数据的语义信息抽取是构建各类语义信息服务的基础。利用以声明式查询为代表的数据管理方法实现信息抽取则是当前相关领域的前沿研究课题。然而现有研究缺乏合适的语义数据模型作为支撑,导致抽取过程中语义信息处理与语义数据结构割裂,阻碍了数据管理方法与信息抽取技术的深度融合,不利于实现大规模信息抽取任务。为此,本项目将设计面向信息抽取的语义数据模型,以合理一致的形式反映数据语义在抽取过程中呈现的多层次、多刻面、多义等特征;利用能归纳数据特征的抽取模式设计声明式查询语言以表现抽取需求,并研究相应的适于语义信息抽取的处理代数和优化方法;通过本体概念映射和重要度分析设计信息抽取策略,实现本体导向的大规模语义信息的半自动抽取。该研究能从语义演化角度体现抽取特点,通过数据特征归纳刻画抽取需求,基于查询处理实现抽取计算,利用本体信息驱动抽取任务,构建实用系统验证抽取方法,具有较强的理论价值与广阔的应用前景。
中文关键词: 语义数据建模;信息抽取;查询语言;查询优化;主题模型
英文摘要: Extracting semantic information from large-scale unstructured data plays a fundamental role in building various kinds of semantic information services. To carry out extraction tasks through typical methods in data management, such as declarative query, is the new trend in related fields. However, current studies often lack an appropriate semantic data model as the basis of data query. This lack leads to the gap between semantic information processing and semantic data structure during extraction, hampers the mergence of the data management and the information extraction methods, and hinders the efficient solutions to large-scale information extraction. In this proposal, we will design a semantic data model oriented to information extraction, which can depict the features of multi-layered, multi-faceted, and polysemous data semantics during information extraction. Based on the data model, we will deploy patterns representing data features to extract related data elements, and design a declarative query language based on the patterns to present information extraction tasks. And we will study the query processing algebra and optimization methods for the information extraction queries. Further, we will establish the ontology mapping mechanisms based on the language and propose an extraction policy based on analyzing
英文关键词: Semantic Data Modeling;Information Extraction;Query Language;Query Optimization;Topic Model