医疗信息提取工作,用于处理德国临床文本 (A Medical Information Extraction Workbench to Process German Clinical Text)

Roland Roller,Laura Seiffe,Ammer Ayach,Sebastian Möller,Oliver Marten,Michael Mikhailov,Christoph Alt,Danilo Schmidt,Fabian Halleck,Marcel Naik,Wiebke Duettmann,Klemens Budde

from arxiv, Paper under review since 2021

Background: In the information extraction and natural language processing domain, accessible datasets are crucial to reproduce and compare results. Publicly available implementations and tools can serve as benchmark and facilitate the development of more complex applications. However, in the context of clinical text processing the number of accessible datasets is scarce -- and so is the number of existing tools. One of the main reasons is the sensitivity of the data. This problem is even more evident for non-English languages. Approach: In order to address this situation, we introduce a workbench: a collection of German clinical text processing models. The models are trained on a de-identified corpus of German nephrology reports. Result: The presented models provide promising results on in-domain data. Moreover, we show that our models can be also successfully applied to other biomedical text in German. Our workbench is made publicly available so it can be used out of the box, as a benchmark or transferred to related problems.

翻译：信息提取和自然语言处理领域:在信息提取和自然语言处理领域,可获取的数据集对于复制和比较结果至关重要。公开可用的实施和工具可以作为基准,促进更复杂的应用的开发。然而,在临床文本处理方面,可获取的数据集数量很少,现有工具的数量也很少。主要原因之一是数据的敏感性。这个问题在非英语语言中更为明显。方法:为了解决这一问题,我们引入了一个工作箱:一个德国临床文本处理模型集。这些模型经过了有关德国肾脏学报告的分辨组合的培训。结果:所展示的模型为内域数据提供了有希望的结果。此外,我们还表明,我们的模型也可以成功地应用于德语中的其他生物医学文本。我们的工作箱可以公开,以便将其作为基准或转移到相关问题中。

相关内容

信息抽取

关注 350

信息抽取（Information Extraction: IE）是把文本里包含的信息进行结构化处理，变成表格一样的组织形式。输入信息抽取系统的是原始文本，输出的是固定格式的信息点。信息点从各种各样的文档中被抽取出来，然后以统一的形式集成在一起。这就是信息抽取的主要任务。信息以统一的形式集成在一起的好处是方便检查和比较。信息抽取技术并不试图全面理解整篇文档，只是对文档中包含相关信息的部分进行分析。至于哪些信息是相关的，那将由系统设计时定下的领域范围而定。

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日