Entities lie in the heart of biomedical natural language understanding, and the biomedical entity linking (EL) task remains challenging due to the fine-grained and diversiform concept names. Generative methods achieve remarkable performances in general domain EL with less memory usage while requiring expensive pre-training. Previous biomedical EL methods leverage synonyms from knowledge bases (KB) which is not trivial to inject into a generative method. In this work, we use a generative approach to model biomedical EL and propose to inject synonyms knowledge in it. We propose KB-guided pre-training by constructing synthetic samples with synonyms and definitions from KB and require the model to recover concept names. We also propose synonyms-aware fine-tuning to select concept names for training, and propose decoder prompt and multi-synonyms constrained prefix tree for inference. Our method achieves state-of-the-art results on several biomedical EL tasks without candidate selection which displays the effectiveness of proposed pre-training and fine-tuning strategies.
翻译:生物医学自然语言理解的核心在于实体,而生物医学联系实体(EL)的任务仍然具有挑战性,因为其概念名称精细精细和多样化。产生的方法在一般领域EL取得显著的成绩,记忆用量较少,但需要花费昂贵的培训前培训。生物医学EL方法利用知识库(KB)的同义词来注入基因化方法。在这项工作中,我们采用基因化方法来模拟生物医学EL,并提议在生物医学EL中注入同义词知识。我们建议KB指导的预培训,方法是用KB的同义词和定义制作合成样品,并要求采用模型来恢复概念名称。我们还提议将Sonom-aware微调来选择用于培训的概念名称,并提出解coder 快速和多义词限制的前缀树来进行推断。我们的方法在不选候选人的情况下就几项生物医学EL任务取得了最新的结果,这些任务显示了拟议的预先培训和微调战略的有效性。