Medical knowledge bases (KBs), distilled from biomedical literature and regulatory actions, are expected to provide high-quality information to facilitate clinical decision making. Entity disambiguation (also referred to as entity linking) is considered as an essential task in unlocking the wealth of such medical KBs. However, existing medical entity disambiguation methods are not adequate due to word discrepancies between the entities in the KB and the text snippets in the source documents. Recently, graph neural networks (GNNs) have proven to be very effective and provide state-of-the-art results for many real-world applications with graph-structured data. In this paper, we introduce ED-GNN based on three representative GNNs (GraphSAGE, R-GCN, and MAGNN) for medical entity disambiguation. We develop two optimization techniques to fine-tune and improve ED-GNN. First, we introduce a novel strategy to represent entities that are mentioned in text snippets as a query graph. Second, we design an effective negative sampling strategy that identifies hard negative samples to improve the model's disambiguation capability. Compared to the best performing state-of-the-art solutions, our ED-GNN offers an average improvement of 7.3% in terms of F1 score on five real-world datasets.
翻译:从生物医学文献和监管行动中提取出来的医学知识基础(KBs)预计将提供高质量的信息,以便利临床决策。实体脱节(也称为实体连接)被视为释放这类医学基础财富的一项基本任务。然而,现有的医学实体脱节方法并不充分,因为KB实体与源文件中的文本片断之间字数差异。最近,图形神经网络(GNNs)已证明非常有效,为许多带有图表结构数据的现实世界应用提供了最新结果。在本文件中,我们根据三个具有代表性的GNS(GraphSAGE、R-GCN和MAGNNN)推出ED-GNNNN,作为医学实体脱节的基本任务。我们开发了两种优化技术,以微调和改进ED-GNNN。首先,我们引入了一种新型战略,以在文本缩略图中提及的实体作为查询图。第二,我们设计了一种有效的负面抽样战略,以找出硬性负面样本,用以改进模型的FMISG标准值的5比值能力。比较了我们的标准-G标准标准值的改进。