Pre-trained language models such as BERT have been a key ingredient to achieve state-of-the-art results on a variety of tasks in natural language processing and, more recently, also in information retrieval.Recent research even claims that BERT is able to capture factual knowledge about entity relations and properties, the information that is commonly obtained from knowledge graphs. This paper investigates the following question: Do BERT-based entity retrieval models benefit from additional entity information stored in knowledge graphs? To address this research question, we map entity embeddings into the same input space as a pre-trained BERT model and inject these entity embeddings into the BERT model. This entity-enriched language model is then employed on the entity retrieval task. We show that the entity-enriched BERT model improves effectiveness on entity-oriented queries over a regular BERT model, establishing a new state-of-the-art result for the entity retrieval task, with substantial improvements for complex natural language queries and queries requesting a list of entities with a certain property. Additionally, we show that the entity information provided by our entity-enriched model particularly helps queries related to less popular entities. Last, we observe empirically that the entity-enriched BERT models enable fine-tuning on limited training data, which otherwise would not be feasible due to the known instabilities of BERT in few-sample fine-tuning, thereby contributing to data-efficient training of BERT for entity search.
翻译:诸如BERT等经过事先培训的语言模型是自然语言处理和最近信息检索中各种任务取得最新结果的一个关键要素。 最近的研究甚至声称,BERT能够获取实体关系和属性的事实知识,通常从知识图表中获得的信息。 本文调查了以下问题:BERT基于实体的检索模型是否受益于知识图表中储存的额外实体信息? 为解决这一研究问题,我们绘制了实体的地图,将实体嵌入与经过培训的BERT模型相同的输入空间,并将这些实体嵌入BERT模型中。 这个实体丰富语言模型随后用于实体的检索任务。 我们表明,实体丰富的BERT模型提高了实体对常规BERT模型中面向实体的询问的有效性,为实体检索任务确立了新的最新结果,对复杂的自然语言查询和询问作了重大改进,请求列出具有某种属性的实体名单。 此外,我们显示,由我们实体丰富、经过培训的模型提供的实体信息将特别有助于实体的精细的BERT培训,因此有助于在不那么广的实体中进行实验性的培训。