Dementia is a neurodegenerative disorder that causes cognitive decline and affects more than 50 million people worldwide. Dementia is under-diagnosed by healthcare professionals - only one in four people who suffer from dementia are diagnosed. Even when a diagnosis is made, it may not be entered as a structured International Classification of Diseases (ICD) diagnosis code in a patient's charts. Information relevant to cognitive impairment (CI) is often found within electronic health records (EHR), but manual review of clinician notes by experts is both time consuming and often prone to errors. Automated mining of these notes presents an opportunity to label patients with cognitive impairment in EHR data. We developed natural language processing (NLP) tools to identify patients with cognitive impairment and demonstrate that linguistic context enhances performance for the cognitive impairment classification task. We fine-tuned our attention based deep learning model, which can learn from complex language structures, and substantially improved accuracy (0.93) relative to a baseline NLP model (0.84). Further, we show that deep learning NLP can successfully identify dementia patients without dementia-related ICD codes or medications.
翻译:痴呆症是一种导致认知下降并影响全世界5 000多万人的神经退化性障碍。痴呆症被医疗专业人员诊断不足,只有四分之一的人被诊断患有痴呆症。即使进行了诊断,也不得作为结构性的国际疾病分类(ICD)诊断代码输入患者的病历表。有关认知缺陷(CI)的信息通常见于电子健康记录(EHR)中,但专家对临床笔记的人工审查既耗时又往往容易出错。这些笔记的自动挖掘为在EHR数据中给认知障碍患者贴上标签提供了机会。我们开发了自然语言处理工具,以识别认知障碍患者,并表明语言环境可以提高认知障碍分类任务的性能。我们微调了基于关注的深层次学习模式,可以从复杂的语言结构中学习,并且大大改进了精度(0.93)相对于基线NLP模式(0.84)。此外,我们显示,深度学习的NLP可以成功识别不患有痴呆症的病人,而没有与 ICD 代码或药物。