The ability to automatically extract Knowledge Graphs (KG) from a given collection of documents is a long-standing problem in Artificial Intelligence. One way to assess this capability is through the task of slot filling. Given an entity query in form of [Entity, Slot, ?], a system is asked to `fill' the slot by generating or extracting the missing value from a relevant passage or passages. This capability is crucial to create systems for automatic knowledge base population, which is becoming in ever-increasing demand, especially in enterprise applications. Recently, there has been a promising direction in evaluating language models in the same way we would evaluate knowledge bases, and the task of slot filling is the most suitable to this intent. The recent advancements in the field try to solve this task in an end-to-end fashion using retrieval-based language models. Models like Retrieval Augmented Generation (RAG) show surprisingly good performance without involving complex information extraction pipelines. However, the results achieved by these models on the two slot filling tasks in the KILT benchmark are still not at the level required by real-world information extraction systems. In this paper, we describe several strategies we adopted to improve the retriever and the generator of RAG in order to make it a better slot filler. Our KGI0 system (available at https://github.com/IBM/retrieve-write-slot-filling) reached the top-1 position on the KILT leaderboard on both T-REx and zsRE dataset with a large margin.
翻译:自动从某个文件集中提取知识图( KG) 的能力是人工智能中长期存在的一个问题。 评估这一能力的一个方法就是填补空档。 由于实体询问的形式是[实体、 槽, 要求系统通过生成或从相关通道或通道中提取缺失的值来“ 填充” 空档。 这种能力对于创建自动知识基群系统至关重要,这种系统的需求正在不断增加,特别是在企业应用程序中。 最近,在评价语言模型方面有一个很有希望的方向,即以我们评价知识库的方式评价语言模型,而填补空档的任务最适合这种意图。 鉴于实体查询的形式是[实体、,最近实地的进展试图使用基于检索的语言模型以端到端的方式解决这个问题。 Retearval Augmented Pages(RAG) 模型显示出令人惊讶的良好业绩,而没有复杂的信息提取管道。 然而,这两个模型在KILT基准中完成的两个职位填充任务的结果仍然没有达到真实世界信息库, 而填补任务的任务最适合这个目的。 在本文中,我们改进了RELI 和RI 最新数据库系统( RI) 改进了它的一些战略。