Biomedical entity linking (EL) consists of named entity recognition (NER) and named entity disambiguation (NED). EL models are trained on corpora labeled by a predefined KB. However, it is a common scenario that only entities within a subset of the KB are precious to stakeholders. We name this scenario partial knowledge base inference: training an EL model with one KB and inferring on the part of it without further training. In this work, we give a detailed definition and evaluation procedures for this practically valuable but significantly understudied scenario and evaluate methods from three representative EL paradigms. We construct partial KB inference benchmarks and witness a catastrophic degradation in EL performance due to dramatically precision drop. Our findings reveal these EL paradigms can not correctly handle unlinkable mentions (NIL), so they are not robust to partial KB inference. We also propose two simple-and-effective redemption methods to combat the NIL issue with little computational overhead.
翻译:生物医学实体链接(EL)包括命名实体识别(NER)和命名实体消歧(NED)。EL模型是在预定义知识库(KB)上标注的语料库上进行训练的。然而,常见的情况是,只有知识库的子集中的实体对利益相关者至关重要。我们称这种情况为部分知识库推理:使用一个知识库训练EL模型,并在其一部分上进行推理而无需进行进一步的训练。在这项工作中,我们给出了这种实际有价值但研究不足的场景的详细定义和评估程序,并评估了来自三个代表性EL范例的方法。我们构建了部分KB推理基准,发现由于极大的精度下降导致EL性能出现了灾难性的退化。我们的发现揭示了这些EL范例无法正确处理非链接提及(NIL),因此它们不具备部分KB推理的鲁棒性。我们还提出了两种简单而有效的赎回方法来解决NIL问题,并且计算开销极小。