Medical terminology normalization aims to map the clinical mention to terminologies come from a knowledge base, which plays an important role in analyzing Electronic Health Record(EHR) and many downstream tasks. In this paper, we focus on Chinese procedure terminology normalization. The expression of terminologies are various and one medical mention may be linked to multiple terminologies. Previous study explores some methods such as multi-class classification or learning to rank(LTR) to sort the terminologies by literature and semantic information. However, these information is inadequate to find the right terminologies, particularly in multi-implication cases. In this work, we propose a combined recall and rank framework to solve the above problems. This framework is composed of a multi-task candidate generator(MTCG), a keywords attentive ranker(KAR) and a fusion block(FB). MTCG is utilized to predict the mention implication number and recall candidates with semantic similarity. KAR is based on Bert with a keywords attentive mechanism which focuses on keywords such as procedure sites and procedure types. FB merges the similarity come from MTCG and KAR to sort the terminologies from different perspectives. Detailed experimental analysis shows our proposed framework has a remarkable improvement on both performance and efficiency.
翻译:医学术语正常化旨在绘制临床上提及术语的图示,该词来自知识库,该知识库在分析电子健康记录和许多下游任务方面起着重要作用。在本文中,我们侧重于中国程序术语的正常化。术语的表达形式各异,一个医学上的提法可能与多个术语相关。以前的研究探索了多种方法,如多级分类或学习等级(LTR),以通过文献和语义信息对术语进行分类。然而,这些信息不足以找到正确的术语,特别是在多用途案例中。在这项工作中,我们提出了一个综合的回顾和排名框架,以解决上述问题。这个框架由多任务候选生成器(MTCG)、关键词注意排名器(KAR)和一个聚合区块(FB)组成。MTCG用来预测提及所涉数字,并召回具有语义相似性的候选人。KAR以一个关键词专注机制为基础,侧重于关键词,例如程序网站和程序类型。FB将类似性框架合并为解决上述问题。这个框架由多任务候选的备选生成器(MTCG)和KAR)以及一个不同功能分析周期,显示了从MTCG和不同的业绩分析。