This paper presents a novel prototype for biomedical term normalization of electronic health record excerpts with the Unified Medical Language System (UMLS) Metathesaurus. Despite being multilingual and cross-lingual by design, we first focus on processing clinical text in Spanish because there is no existing tool for this language and for this specific purpose. The tool is based on Apache Lucene to index the Metathesaurus and generate mapping candidates from input text. It uses the IXA pipeline for basic language processing and resolves ambiguities with the UKB toolkit. It has been evaluated by measuring its agreement with MetaMap in two English-Spanish parallel corpora. In addition, we present a web-based interface for the tool.
翻译:本文介绍了与统一医疗语言系统(UMLS)Metathesaurus电子健康记录节选实现生物医学术语正常化的新原型。尽管我们的设计是多语种和跨语种的,但我们首先侧重于西班牙文临床文本的处理,因为没有用于这一语言和这一特定目的的现有工具。该工具以Apache Lucene为基础,用输入文本编制Metathesaurus索引和生成绘图候选人。它使用九A管道处理基本语言,并解决与UKB工具包的模糊之处。它通过衡量它与MetaMap的协议,用两个英语-西班牙语平行的Corbora来评估。此外,我们为该工具提供了一个基于网络的界面。