Knowledge Graphs (KG) are of vital importance for multiple applications on the web, including information retrieval, recommender systems, and metadata annotation. Regardless of whether they are built manually by domain experts or with automatic pipelines, KGs are often incomplete. Recent work has begun to explore the use of textual descriptions available in knowledge graphs to learn vector representations of entities in order to preform link prediction. However, the extent to which these representations learned for link prediction generalize to other tasks is unclear. This is important given the cost of learning such representations. Ideally, we would prefer representations that do not need to be trained again when transferring to a different task, while retaining reasonable performance. In this work, we propose a holistic evaluation protocol for entity representations learned via a link prediction objective. We consider the inductive link prediction and entity classification tasks, which involve entities not seen during training. We also consider an information retrieval task for entity-oriented search. We evaluate an architecture based on a pretrained language model, that exhibits strong generalization to entities not observed during training, and outperforms related state-of-the-art methods (22% MRR improvement in link prediction on average). We further provide evidence that the learned representations transfer well to other tasks without fine-tuning. In the entity classification task we obtain an average improvement of 16% in accuracy compared with baselines that also employ pre-trained models. In the information retrieval task, we obtain significant improvements of up to 8.8% in NDCG@10 for natural language queries. We thus show that the learned representations are not limited KG-specific tasks, and have greater generalization properties than evaluated in previous work.
翻译:知识图(KG)对于网上多种应用至关重要,包括信息检索、建议系统以及元数据说明。不管这些图是由域专家手工或用自动管道手工建立,KG通常都是不完整的。最近的工作已经开始探索使用知识图中的文字说明来学习各实体的矢量表示,以便预先进行链接预测。然而,这些用于将预测与其他任务连接起来的表示(KG)的程度并不明确。鉴于学习这种表述的成本,这一点很重要。理想是,我们更希望那些在向不同任务转移时不需要再接受培训的表示,同时保持合理的表现。在这项工作中,我们建议对通过链接预测目标学习的实体表示整体评价协议。我们考虑使用知识图中的文字说明,以便学习实体的矢量说明,以便了解实体在培训期间没有看到的实体的矢量表示。我们根据预先培训的语言模型来评价一个结构,在培训期间没有观察到的实体有很强的概括性说明,并且超越了相关的最新设计方法(22 % MRRG)在保留合理的表现上,因此,我们在平均的排序中,我们从质量任务中学习了相当的改进。