Clinical Named Entity Recognition (CNER) aims to identify and classify clinical terms such as diseases, symptoms, treatments, exams, and body parts in electronic health records, which is a fundamental and crucial task for clinical and translational research. In recent years, deep neural networks have achieved significant success in named entity recognition and many other Natural Language Processing (NLP) tasks. Most of these algorithms are trained end to end, and can automatically learn features from large scale labeled datasets. However, these data-driven methods typically lack the capability of processing rare or unseen entities. Previous statistical methods and feature engineering practice have demonstrated that human knowledge can provide valuable information for handling rare and unseen cases. In this paper, we address the problem by incorporating dictionaries into deep neural networks for the Chinese CNER task. Two different architectures that extend the Bi-directional Long Short-Term Memory (Bi-LSTM) neural network and five different feature representation schemes are proposed to handle the task. Computational results on the CCKS-2017 Task 2 benchmark dataset show that the proposed method achieves the highly competitive performance compared with the state-of-the-art deep learning methods.
翻译:临床实体识别(CNER)旨在确定和分类电子健康记录中的临床术语,如疾病、症状、治疗、检查和身体部位,这是临床和翻译研究的一项至关重要的基本任务。近年来,深神经网络在命名实体识别和许多其他自然语言处理任务方面取得了巨大成功。这些算法大多经过培训,最终结束,可以自动从大规模标签数据集中学习特征。然而,这些数据驱动方法通常缺乏处理稀有或无形实体的能力。以往的统计方法和特异工程实践表明,人类知识可以为处理稀有和不可见案件提供宝贵的信息。在本文件中,我们通过将词典纳入中国网络任务的深度神经网络来解决这一问题。提出了扩大双向短期内存(Bi-LSTM)神经网络和五个不同特征代表计划以完成这项任务。CCKS-2017任务2基准数据集的比较结果显示,拟议方法取得了与州深层学习方法相比高度竞争性的业绩。