Dementia related cognitive impairment (CI) affects over 55 million people worldwide and is growing rapidly at the rate of one new case every 3 seconds. With a recurring failure of clinical trials, early diagnosis is crucial, but 75% of dementia cases go undiagnosed globally with up to 90% in low-and-middle-income countries. Current diagnostic methods are notoriously complex, involving manual review of medical notes, numerous cognitive tests, expensive brain scans or spinal fluid tests. Information relevant to CI is often found in the electronic health records (EHRs) and can provide vital clues for early diagnosis, but a manual review by experts is tedious and error prone. This project develops a novel state-of-the-art automated screening pipeline for scalable and high-speed discovery of undetected CI in EHRs. To understand the linguistic context from complex language structures in EHR, a database of 8,656 sequences was constructed to train attention-based deep learning natural language processing model to classify sequences. A patient level prediction model based on logistic regression was developed using the sequence level classifier. The deep learning system achieved 93% accuracy and AUC = 0.98 to identify patients who had no earlier diagnosis, dementia-related diagnosis code, or dementia-related medications in their EHR. These patients would have otherwise gone undetected or detected too late. The EHR screening pipeline was deployed in NeuraHealthNLP, a web application for automated and real-time CI screening by simply uploading EHRs in a browser. NeuraHealthNLP is cheaper, faster, more accessible, and outperforms current clinical methods including text-based analytics and machine learning approaches. It makes early diagnosis viable in regions with scarce health care services but accessible internet or cellular services.
翻译:痴呆症相关认知缺陷(CI)影响着全世界超过5,500万人,以每3秒1例新病例的速度迅速增长。由于临床试验一再失败,早期诊断至关重要,但75%的痴呆症病例在全球无法诊断,低中收入国家高达90%。目前的诊断方法非常复杂,包括人工审查医学笔记、多次认知测试、昂贵的脑扫描或脊柱液测试。与CI有关的信息经常出现在电子健康记录(EHRs)中,可以提供早期诊断的重要线索,但专家的人工审查是乏味和易出错的。这个项目开发了一个新的最先进的自动筛查管道,在低中收入国家可以升级和高速地发现未被发现的CI。为了了解EHR复杂的语言结构中的语言背景,建立了8,656个序列数据库,以培训基于关注的深度学习自然语言处理模型进行分类。基于物流回归的患者水平预测模型,但利用序列级的HR内部成本分类和易出错误误。 深入的互联网诊断系统,包括903的准确度,在EHR的早期诊断中,一个深度学习系统,在ELS-CS