Dementia related cognitive impairment (CI) is a neurodegenerative disorder, affecting over 55 million people worldwide and growing rapidly at the rate of one new case every 3 seconds. 75% cases go undiagnosed globally with up to 90% in low-and-middle-income countries, leading to an estimated annual worldwide cost of USD 1.3 trillion, forecasted to reach 2.8 trillion by 2030. With no cure, a recurring failure of clinical trials, and a lack of early diagnosis, the mortality rate is 100%. Information in electronic health records (EHR) can provide vital clues for early detection of CI, but a manual review by experts is tedious and error prone. Several computational methods have been proposed, however, they lack an enhanced understanding of the linguistic context in complex language structures of EHR. Therefore, I propose a novel and more accurate framework, NeuraHealth, to identify patients who had no earlier diagnosis. In NeuraHealth, using patient EHR from Mass General Brigham BioBank, I fine-tuned a bi-directional attention-based deep learning natural language processing model to classify sequences. The sequence predictions were used to generate structured features as input for a patient level regularized logistic regression model. This two-step framework creates high dimensionality, outperforming all existing state-of-the-art computational methods as well as clinical methods. Further, I integrate the models into a real-world product, a web app, to create an automated EHR screening pipeline for scalable and high-speed discovery of undetected CI in EHR, making early diagnosis viable in medical facilities and in regions with scarce health services.
翻译:痴呆症相关认知缺陷(CI)是一种神经退化障碍,影响到全世界超过5,500万人,以每3秒1例新病例的速度迅速增长。 75%的病例在全球范围无人诊断,低中收入国家高达90%。 导致每年估计成本为1.3万亿美元,预测到2030年将达到2.8万亿美元。在没有治愈的情况下,临床试验一再失败,缺乏早期诊断,死亡率为100%。电子健康记录中的信息可以为早期发现CI提供重要线索,但专家的人工审查是乏味的,容易出错。但是,一些计算方法在全球范围无人诊断,低中收入国家高达90%,导致全球范围语言结构复杂,每年成本估计为1.3万亿美元,预计到2030年时将达到2.8万亿美元。在NeuraHealth中,没有治愈,临床试验一再失败,临床试验反复失败。在Mass General General Brigham BioBank中,我对基于双向定向关注的深度学习的自然语言处理模型进行精确的检查,但专家的人工审查也是易出错的。 提出了几种计算方法,在快速的快速的精确的模型中,而不断的精确的精确的精确的计算方法,这是一种结构的精确的计算方法,在正常的模型中生成的模型中生成的模型,在不断形成一种结构的模型。