We describe our work on information extraction in medical documents written in German, especially detecting negations using an architecture based on the UIMA pipeline. Based on our previous work on software modules to cover medical concepts like diagnoses, examinations, etc. we employ a version of the NegEx regular expression algorithm with a large set of triggers as a baseline. We show how a significantly smaller trigger set is sufficient to achieve similar results, in order to reduce adaptation times to new text types. We elaborate on the question whether dependency parsing (based on the Stanford CoreNLP model) is a good alternative and describe the potentials and shortcomings of both approaches.
翻译:我们描述我们在用德文编写的医疗文件中的信息提取工作,特别是利用基于UIMA管道的建筑来发现否定现象。我们根据以前关于软件模块的工作,以涵盖诊断、检查等医疗概念。我们使用NegEx常规表达算法的版本,以大量触发器作为基线。我们表明一个小得多的触发器组如何足以取得类似结果,以便减少适应新文本类型的时间。我们阐述了依赖性区分(以斯坦福核心NLP模式为基础)是否是一个很好的替代方法,并描述了这两种方法的潜力和缺点。