项目名称: 面向Web文本的属性和属性值知识获取方法研究
项目编号: No.61272361
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 张春霞
作者单位: 北京理工大学
项目金额: 80万元
中文摘要: 概念及其实例的属性和属性值知识获取是Web文本挖掘和信息抽取中的前沿性课题。属性和属性值知识是本体的核心组成部分,是构建语义Web的重要基础,也是实现知识共享和互操作的前提。属性和属性值知识获取已成为制约信息检索和文本分类等智能信息处理技术发展的瓶颈。现有的研究工作主要是从结构化网页、以列表型文本为主的半结构化网页中抽取显式类型的属性和属性值,相关方法往往受限于特定的领域、概念或属性。针对这些问题,本项目将系统地研究从Web文本中获取概念和概念实例的属性和属性值知识的理论模型和核心方法,具体包括:(1)属性和属性值知识在Web文本中的表达模型和方法;(2)属性和属性值的多维分类体系;(3)具有领域自适应性的显式和隐式的属性和属性值知识的抽取和学习方法;(4)属性和属性值知识的验证方法。在此基础上,开发一个概念和概念实例的知识获取平台,并在该平台上评估和分析提出的知识抽取、学习和验证的方法。
中文关键词: 属性;属性值;知识获取;知识验证;Web挖掘
英文摘要: Automatic acquisition of attributes and their values of concepts and instances is one of the research frontiers in the fields of web content mining and information extraction. Knowledge of attributes and their values is a crucial component of ontology, a basis of building the Semantic Web, and a condition of realizing knowledge sharing and interoperability.This kind of knowledge has become a bottleneck of hindering the development of intelligent information processing techniques such as information retrieval,text classification and text clustering. Current works mainly focus on how to extract explicit attributes and their values from structural web pages and semi-structural web pages with item lists. Moreover,present methods are usually restricted by specific domains, concepts or attributes. To solve these problems, this project will systematically study theoretical models and core algorithms of acquiring attributes and their values from web texts. The research contents include:(1) constructing expressing models of attributes and their values in web texts;(2) building a multi-dimension classification framework of attributes and their values;(3) designing a domain adaptive approach to extracting and learning explicit and implicit attributes and their values;(4) devising a verification approach of knowledge about
英文关键词: attributes;attribute values;knowledge acquisition;knowledge verification;web mining