Discovering entity mentions that are out of a Knowledge Base (KB) from texts plays a critical role in KB maintenance, but has not yet been fully explored. The current methods are mostly limited to the simple threshold-based approach and feature-based classification; the datasets for evaluation are relatively rare. In this work, we propose BLINKout, a new BERT-based Entity Linking (EL) method which can identify mentions that do not have a corresponding KB entity by matching them to a special NIL entity. To this end, we integrate novel techniques including NIL representation, NIL classification, and synonym enhancement. We also propose Ontology Pruning and Versioning strategies to construct out-of-KB mentions from normal, in-KB EL datasets. Results on four datasets of clinical notes and publications show that BLINKout outperforms existing methods to detect out-of-KB mentions for medical ontologies UMLS and SNOMED CT.
翻译:在这项工作中,我们提议采用BLINKout,即基于BERT的新的实体链接(EL)方法,该方法可以通过将其与专门的NIL实体相匹配来识别没有相应的 KB 实体。为此,我们整合了新颖技术,包括NIL 代表制、NIL 分类和同义词强化。我们还提议采用Ontology Propruting和编译战略,从正常的KB EL数据集中建立KB。关于临床笔记和出版物的四个数据集的结果显示,BLINKout比现有的发现UMLS和SNOMED CT医学类型所用的KB外科学方法要强。