Information Extraction (IE) from text refers to the task of extracting structured knowledge from unstructured text. The task typically consists of a series of sub-tasks such as Named Entity Recognition and Relation Extraction. Sourcing entity and relation type specific training data is a major bottleneck in the above sub-tasks.In this work we present a slot filling approach to the task of biomedical IE, effectively replacing the need for entity and relation-specific training data, allowing to deal with zero-shot settings. We follow the recently proposed paradigm of coupling a Tranformer-based bi-encoder, Dense Passage Retrieval, with a Transformer-based reader model to extract relations from biomedical text. We assemble a biomedical slot filling dataset for both retrieval and reading comprehension and conduct a series of experiments demonstrating that our approach outperforms a number of simpler baselines. We also evaluate our approach end-to-end for standard as well as zero-shot settings. Our work provides a fresh perspective on how to solve biomedical IE tasks, in the absence of relevant training data. Our code, models and pretrained data are available at https://github.com/healx/biomed-slot-filling.
翻译:从文本中提取的信息(IE)指从非结构化文本中提取结构化知识的任务。任务通常由一系列子任务组成,如命名实体识别和关系提取等。给实体和关联型特定培训数据是上述子任务中的一个主要瓶颈。在这项工作中,我们提出了一个填补时间档的方法,以有效取代对实体和特定相关培训数据的需求,从而能够处理零发环境。我们遵循了最近提出的将基于Tranrefor的双电码(Dense Passage Retrieval)与基于变换器的阅读器模型相结合的模式,以从生物医学文本中提取关系。我们收集了一个生物医学插座,用于检索和阅读理解数据,并进行一系列实验,表明我们的方法超越了一些简单的基线。我们还评估了标准以及零发环境的终端与终端。我们的工作为在没有相关培训数据的情况下如何解决生物医学IE任务提供了新的视角。我们的代码、模型和预培训数据可在 https://gius/pioxmilling.