We study the problem of named entity recognition (NER) from electronic medical records, which is one of the most fundamental and critical problems for medical text mining. Medical records which are written by clinicians from different specialties usually contain quite different terminologies and writing styles. The difference of specialties and the cost of human annotation makes it particularly difficult to train a universal medical NER system. In this paper, we propose a label-aware double transfer learning framework (La-DTL) for cross-specialty NER, so that a medical NER system designed for one specialty could be conveniently applied to another one with minimal annotation efforts. The transferability is guaranteed by two components: (i) we propose label-aware MMD for feature representation transfer, and (ii) we perform parameter transfer with a theoretical upper bound which is also label aware. We conduct extensive experiments on 12 cross-specialty NER tasks. The experimental results demonstrate that La-DTL provides consistent accuracy improvement over strong baselines. Besides, the promising experimental results on non-medical NER scenarios indicate that La-DTL is potential to be seamlessly adapted to a wide range of NER tasks.
翻译:我们研究了电子医疗记录中的名称实体识别(NER)问题,这是医学文字挖掘的最根本和最关键的问题之一;不同专业的临床医生编写的医疗记录通常含有非常不同的术语和写作风格;专业的差别和人类注解成本使得特别难以培训一个通用的医疗NER系统;在本文中,我们建议为跨专业NER建立一个具有标签意识的双重转移学习框架(La-DTL),以便为某一专业设计的医疗净化系统可以方便地适用于另一个专门系统,尽量减少注解努力;医疗记录通常由两个部分保证可转移性:(一) 我们建议有标签的MMMD用于特征代表转移,和(二) 我们进行参数转移,理论上层也有意识,我们进行广泛的跨专业NER任务实验。实验结果表明,LA-DTL提供比强基准一致的准确性改进。此外,非医学NER情景的有希望的实验结果表明,LA-DTL有可能无缝地适应广泛的NER任务。