项目名称: 面向知识库的实体链接技术研究
项目编号: No.61502253
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 自动化技术、计算机技术
项目作者: 沈玮
作者单位: 南开大学
项目金额: 21万元
中文摘要: 知识库在信息检索、问答系统等领域发挥着越来越重要的作用,由于现有知识库的覆盖面比较有限,对现有知识库进行扩展就是一项非常必要且有意义的工作。实体链接被认为是知识库扩展的一项重要子任务。实体链接是指为万维网数据中出现的实体名字找到其在知识库中对应实体的过程。实体链接还有助于文本理解、信息抽取、内容分析等问题的解决。然而由于实体名字的歧义性以及万维网上数据的多样性和异构性,实体链接具有很大的挑战。本项目研究面向知识库的实体链接关键技术,以弥补现有方法的部分问题和不足,具体研究内容包括: (1)利用群体人工智能,研究结合众包和实体链接算法的混合框架,改善实体链接的质量; (2)面向大规模应用,研究高效的实体链接算法,提高实体链接的效率;(3)根据领域知识库的结构特点,研究普适的面向领域知识库的实体链接方法,弥补当前算法只能链接到通用知识库的不足。
中文关键词: 实体链接;知识库;实体消歧;文本理解
英文摘要: Knowledge bases are increasingly important for information retrieval and question answering. As the coverage of existing knowledge bases is limited, it is quite necessary and meaningful to populate the existing knowledge bases. Entity linking is inherently considered as an important subtask for knowledge base population. Entity linking is the task to link the entity mentions in Web data with their corresponding entities in a knowledge base. Potential applications include text understanding, information extraction, and content analysis. However, this task is challenging as the entity name is ambiguous and the data on the Web is various and heterogeneous. This project studies the key techniques for entity linking with a knowledge base in order to overcome the deficiency of existing methods. The specific research content contains: (1) we investigate the hybrid framework that combines crowdsourcing with the entity linking algorithm to increase the entity linking accuracy via leveraging the crowd artificial intelligence; (2) we study the highly efficient entity linking algorithm to increase the linking efficiency for the large-scale application; (3) we study the general domain-specific entity linking framework according to the characteristics of domain-specific knowledge bases, which overcomes the deficiency of existing approaches that just link with general-purpose knowledge bases.
英文关键词: Entity Linking;Knowledge Base;Entity Disambiguation;Text Understanding