项目名称: 基于神经网络的跨语言实体链指研究
项目编号: No.61502035
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 计算机科学学科
项目作者: 郭宇航
作者单位: 北京理工大学
项目金额: 21万元
中文摘要: 跨语言实体链指技术将一种语言的上下文中的名称链接到另一种语言知识库的相应实体上。这种技术打破知识的语言鸿沟,一方面能够最大程度地利用互联网上由不同语言表示的知识库,另一方面也能为缺乏知识库的语言的信息处理提供支持。跨语言实体链指的难点在于如何计算由不同语言表示的文本之间的相似度。本项目深入研究基于神经网络的上下文语义表示方法。通过基于词向量的翻译技术,缓解未登录词对跨语言文本相似度的影响;通过基于段落向量的翻译技术,利用上下文中的全局信息计算跨语言文本之间的相似度;通过将不同语言映射到同一个段落向量空间,实现不经过翻译直接计算跨语言文本相似度的方法,从而减少翻译步骤带来的错误级联。
中文关键词: 实体链指;跨语言;神经网络;词向量;段落向量
英文摘要: Cross-lingual entity linking is a technique which links a name presented in one language to the referent entity in the knowledge based described in another language. This technique can break the language gap in knowledge. On one hand, it can leverage knowledge bases in different languages in the Internet. On the other hand, it can provide information processing supports for the languages which are lack of knowledge bases. The difficult of cross-lingual entity linking is how to compute the similarity between texts which are represented in different languages. This project investigates the context semantic representation based on neural network. Through word vector based translation technique, we can alleviate the affect from out-of-vocabulary words. Through paragraph vector based translation, we can use the global information in the context to calculate the similarity between cross-lingual texts. Through mapping different language into a paragraph vector space, we can calculate cross-lingual text similarity without the translation, which results in less cascading errors from the translation step.
英文关键词: Entity Linking;Cross-Lingual;Neural Network;Word Vector;Paragraph Vector