Identifying patient cohorts from clinical notes in secondary electronic health records is a fundamental task in clinical information management. However, with the growing number of clinical notes, it becomes challenging to analyze the data manually for phenotype detection. Automatic extraction of clinical concepts would helps to identify the patient phenotypes correctly. This paper proposes a novel hybrid model for automatically extracting patient phenotypes using natural language processing and deep learning models to determine the patient phenotypes without dictionaries and human intervention. The model is based on a neural bidirectional sequence model (BiLSTM or BiGRU) and a CNN layer for phenotypes identification. An extra CNN layer is run parallel to the hybrid model to extract more features related to each phenotype. We used pre-trained embeddings such as FastText and Word2vec separately as the input layers to evaluate other embedding's performance. Experimental results using MIMIC III database in internal comparison demonstrate that the proposed model achieved significant performance improvement over existing models. The enhanced version of our model with an extra CNN layer obtained a relatively higher F1-score than the original hybrid model. We also showed that BiGRU layer with FastText embedding had better performance than BiLSTM layer to identify patient phenotypes.
翻译:从二级电子健康记录中的临床笔记中找出病人组群是临床信息管理的一项基本任务。然而,随着临床笔记数量的不断增加,以人工方式分析用于苯型检测的数据将变得具有挑战性。自动提取临床概念将有助于正确识别病人的苯型类型。本文提议了一种新型混合模型,用于利用自然语言处理和深层次学习模型自动提取病人的苯型,以确定没有词典和人类干预的病人的苯型类型。该模型以神经双向序列模型(BILSTM或BIGRU)和CNN一层用于确定苯型的识别。一个额外的CNN层与混合模型平行运行,以提取与每种苯型有关的更多特征。我们分别使用诸如快速图和Word2vec等经过预先训练的嵌入模型,作为评价其他嵌入的性。在内部比较中使用MICM III数据库的实验结果显示,拟议的模型比现有模型取得了显著的性能改进。我们使用超CNN的CN层的强化版本获得了相对较高的F1-sext级的F1-splain 和BILS-S-Istal 的性能模型。我们用了更好的BILS-BIS-BS-BIS-BS-BS-BIS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-BS-S-S-S-BS-BS-BS-BS-BS-BS-BS-BAR-BAR-B-B-B-B-B-S-S-B-S-S-S-B-S-S-S-S-S-S-S-S-S-S-B-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S