面向网络百科的知识抽取研究

项目名称： 面向网络百科的知识抽取研究

项目编号： No.61472436

项目类型： 面上项目

立项/批准年度： 2015

项目学科： 计算机科学学科

项目作者： 王挺

作者单位： 中国人民解放军国防科技大学

项目金额： 83万元

中文摘要： 在Web2.0的推动下，网络百科作为群体智慧的平台得到了飞速发展，已经成为了一种取代传统印刷版大百科全书的颠覆性创新。网络百科不仅为用户提供了丰富的信息，也为计算机的智能应用系统提供了潜在的大规模的知识。但是，以普通文本为主的网络百科很难被计算系统自动使用，只有结构化的知识库才能被智能系统有效利用。因此，根据信息抽取、网络百科发展现状和面临的挑战，以提高网络信息服务的智能化水平为目标，结合网络百科知识在信息组织和语言表达等方面的特点，开展面向大规模网络百科的知识抽取研究，具有重要的应用价值和科学意义。本项目以维基百科、互动百科和百度百科等网络百科的开放信息为对象，针对信息抽取领域新出现的开放性、适应性和规模性需求，研究开放的、可扩展的、具有较高自动化程度的信息抽取方法，将网络百科中弱结构的文本信息转换成可以被其他智能系统直接利用的结构化的知识，从而推动网络信息的智能处理。

中文关键词： 信息抽取；自然语言处理；文本挖掘；信息检索；知识工程

英文摘要： Under the impetus of the Web2.0 technology, Internet encyclopedia as a collective intelligence platform has been rapidly developed and has been considered as a disruptive innovation to replace the traditional printed encyclopedia. Internet Encyclopedia offers users with a wealth of information and also provides intelligent computational application systems with the potential of large-scale knowledge. As the ordinary text-based Internet encyclopedia is hard to be automatically used by the computer system , only structured knowledge base can be used. Thus, considering the current status of Internet encyclopedia and the challenges in the research of information extraction, it is with high practical value and scientific significance to carry out the research of extracting knowledge from large-scale Internet encyclopedia, aiming to improve the intelligence of web information service.The proposed project focus on the new requirements from the information extraction such as openness and scalability, tries to develop the open, scalable and high automatically information extraction method to extract structured knowledge from the weak-structured text in Wikipedia and Baidu Encyclopedia and so on. The extracted knowledge is expected to be further exploited by the AI systems to promote the intelligent Web information processing.

英文关键词： Infromation Extraction;Natural Language Processing;Text Mining;Infromation Retrieval;Knowledge Engineering

成为VIP会员查看完整内容