项目名称: 基于词向量表示的大规模知识图谱构建方法研究
项目编号: No.61472428
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 计算机科学学科
项目作者: 刘桃
作者单位: 中国人民大学
项目金额: 80万元
中文摘要: 大规模知识图谱的构建是计算机实现智能推理的基础。特征表示是制约知识图谱构建效果的一个很重要的因素,传统特征表示方法存在特征表意能力差、缺乏语义可计算性、特征设计过程复杂等问题,而基于深度学习的词向量特征表示方法具有丰富的表意能力,是一种全自动的特征学习方法。本课题拟基于词向量学习,对知识图谱的基本元素(如命名实体、关系)形成全新的特征表示,进而研究基于词向量特征和深度神经网络的知识图谱的自动化构建方法,使得大规模知识图谱的普遍应用成为现实。本课题在词向量学习的方法上,通过对深度神经网络结构的调整和引入先验的语言学知识,解决词向量学习的效率、效果问题;在知识图谱各子任务上,一方面在原有算法的基础上,引入基于词向量的词聚类特征,并将该特征与原特征进行有效的融合;另一方面,提出了面向知识图谱的深度神经网络结构设计方法,在此基础上,提出全新的基于词向量的实体、关系识别算法。
中文关键词: 知识图谱;词向量;特征表示;文本挖掘
英文摘要: Knowledge graph is the basis of artificial intelligence. Feature representation is a key point which affects the quality of the constructed knowledge graph. Traditional feature representation methods have following limitations: 1) word-based features only represent limited information and can't express syntactic and semantic information; 2) feature design process relies on domain experts, and is time consuming. In this proposal we will research on the construction of knowledge graph based on word embeddings, which has the following advantages: 1) word embeddings can express much more information than traditional word-based features; 2) features represented by word embeddings are semantically computable and can be learned through an automatic feature learning process. This proposal focuses on the feature representation of basic elements of knowledge graph, such as entity and relation, and the automatic construction method of knowledge graph based on word embeddings and Deep Neural Networks. For feature representation, we propose a method to integrate prior linguistic knowledge to improve the quality of word embeddings. Word clustering is proposed based on word embeddings and is introduced to state-of-the-art methods of knowledge graph. Finally but most important, we propose two new Deep Neural Networks for the sub-task of knowledge graph construction.
英文关键词: knowledge graph;word embeddings;feature representation;text mining