We present a statistical model for German medical natural language processing trained for named entity recognition (NER) as an open, publicly available model. The work serves as a refined successor to our first GERNERMED model which is substantially outperformed by our work. We demonstrate the effectiveness of combining multiple techniques in order to achieve strong results in entity recognition performance by the means of transfer-learning on pretrained deep language models (LM), word-alignment and neural machine translation. Due to the sparse situation on open, public medical entity recognition models for German texts, this work offers benefits to the German research community on medical NLP as a baseline model. Since our model is based on public English data, its weights are provided without legal restrictions on usage and distribution. The sample code and the statistical model is available at: https://github.com/frankkramer-lab/GERNERMED-pp
翻译:我们提出了一个德国医学自然语言处理的统计模式,该模式经过培训,将命名为实体的识别(NER)作为公开和公开的公开模式,作为我们第一个工作大大超过我们工作绩效的GERNERMED模式的完善后继者。我们展示了将多种技术结合起来的有效性,以便通过在预先培训的深语言模式(LM)、单词对称和神经机翻译方面转让-学习,在实体识别业绩方面取得显著成果。由于德国文本公开、公共医学实体识别模式的稀少,这项工作为德国医学国家医学政策研究界提供了一个基线模型。由于我们的模型以公开的英语数据为基础,其权重在使用和分配方面没有法律限制。样本代码和统计模型可在以下网址查阅:https://github.com/frankkkkrammer-lab/GERNERMED-pp。