Biomedical entity linking is the task of identifying mentions of biomedical concepts in text documents and mapping them to canonical entities in a target thesaurus. Recent advancements in entity linking using BERT-based models follow a retrieve and rerank paradigm, where the candidate entities are first selected using a retriever model, and then the retrieved candidates are ranked by a reranker model. While this paradigm produces state-of-the-art results, they are slow both at training and test time as they can process only one mention at a time. To mitigate these issues, we propose a BERT-based dual encoder model that resolves multiple mentions in a document in one shot. We show that our proposed model is multiple times faster than existing BERT-based models while being competitive in accuracy for biomedical entity linking. Additionally, we modify our dual encoder model for end-to-end biomedical entity linking that performs both mention span detection and entity disambiguation and out-performs two recently proposed models.
翻译:生物医学实体联系的任务是确定文本文件中生物医学概念的提及,并将生物医学概念绘图给目标术语词库中的金库实体。最近使用生物伦理学和伦理学模型连接的实体的进展遵循了一种检索和重新排序模式,首先使用检索模型选择候选实体,然后将检索的候选实体排序为重新排序模式。虽然这一模式产生最新的结果,但在培训和测试时间上都很缓慢,因为他们一次只能提到一个。为了缓解这些问题,我们建议采用基于生物伦理学和伦理学的双重编码器模型,在一份文件中解决多个提及的问题。我们表明,我们提议的模型比现有的生物伦理学和伦理学模型高出许多倍,同时在生物医学实体链接的准确性方面具有竞争力。此外,我们修改了我们的终端到终端生物医学实体的双重编码器模型,该模型既能提及跨区域探测和实体分辨和超越最近提出的两个模型。