项目名称: 互联网环境下中文实体知识挖掘关键技术研究
项目编号: No.61202329
项目类型: 青年科学基金项目
立项/批准年度: 2013
项目学科: 计算机科学学科
项目作者: 刘康
作者单位: 中国科学院自动化研究所
项目金额: 23万元
中文摘要: 从复杂多变的网络数据中挖掘实体、实体类别以及实体关系等知识并进行组织,建立知识间的语义关联,对于文本内容理解、信息检索和问答系统等都具有重要的支撑作用。本申请针对互联网数据"海量不确定"、"多源异构"、"动态变化"、"含噪"等特点,研究互联网环境下的中文实体知识挖掘技术,具体研究内容包括:(1)面向 "关系多样化、可计算、概率化描述"的知识表示需求,研究基于多层语义图的实体知识表示及其知识体系自动构建方法;(2)充分利用网络信息间的差异性、互补性和相关性,研究基于网络信息关联的中文实体知识协同挖掘和验证方法;(3)研究大规模概率化逻辑推理方法,从知识推理的角度探索网络新知识的获取方法;(4)构建实验性实体知识库,并在课题组已有的百科知识问答系统平台上,对以上关键技术进行验证与测试。本申请课题的研究成果将为自然语言理解、互联网信息深度计算等提供参考。
中文关键词: 开放域信息抽取;实体;实体关系;;
英文摘要: Mining entity knowledge (entities, categories and the relationships) will produce significant impact on many applications, such as text content understanding, information retrieval and question answering systems. This application studies the technologies of mining Chinese entity knowledge from the massive, uncertain, multi-source heterogeneous, dynamic and noisy Web data. The main tasks include: (1) Aiming at demands about diversification of relations and the probabilistic description for the knowledge representation, we study the multi-layer semantic graph based knowledge presentation and the automatic construction method of knowledge framework. (2) Making full use of the differences, complementarity and correlation between the Web information, we study the collabrative methods of mining and verifying entity knowledge from the Web. (3) We study the method of new knowledge acquisition from the view of the large-scale probabilistic logic reasoning. (4) We construct the experimental entity knowledge base and test the above key techniques on the existing Chinese Encyclopedia QA platform. The achievements of this project will provide some valuable suggestion for natural language understanding and deep web information computation
英文关键词: Open Information Extraction;Entity;Entity Relation;;