利用电子健康记录对患者时间线进行前瞻 -- -- 深入生成模型 (Foresight -- Deep Generative Modelling of Patient Timelines using Electronic Health Records)

Zeljko Kraljevic,Dan Bean,Anthony Shek,Rebecca Bendayan,Joshua Au Yeung,Alexander Deng,Alfie Baston,Jack Ross,Esther Idowu,James T Teo,Richard J Dobson

Electronic Health Records (EHRs) hold detailed longitudinal information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Temporal modelling of this medical history, which considers the sequence of events, can be used to forecast and simulate future events, estimate risk, suggest alternative diagnoses or forecast complications. While most prediction approaches use mainly structured data or a subset of single-domain forecasts and outcomes, we processed the entire free-text portion of EHRs for longitudinal modelling. We present Foresight, a novel GPT3-based pipeline that uses NER+L tools (i.e. MedCAT) to convert document text into structured, coded concepts, followed by providing probabilistic forecasts for future medical events such as disorders, medications, symptoms and interventions. Since large portions of EHR data are in text form, such an approach benefits from a granular and detailed view of a patient while introducing modest additional noise. On tests in two large UK hospitals (King's College Hospital, South London and Maudsley) and the US MIMIC-III dataset precision@10 of 0.80, 0.81 and 0.91 was achieved for forecasting the next biomedical concept. Foresight was also validated on 34 synthetic patient timelines by 5 clinicians and achieved relevancy of 97% for the top forecasted candidate disorder. Foresight can be easily trained and deployed locally as it only requires free-text data (as a minimum). As a generative model, it can simulate follow-on disorders, medications and interventions for as many steps as required. Foresight is a general-purpose model for biomedical concept modelling that can be used for real-world risk estimation, virtual trials and clinical research to study the progression of diseases, simulate interventions and counterfactuals, and for educational purposes.

翻译：虽然大多数预测方法主要使用结构化数据或单一数据预测和预测并发症的子集,但我们为纵向建模处理了EHR的全部自由文本部分。我们展示了一种基于GPT3的新型管道,该管道使用NER+L工具(即MedCAT)将文件文本转换成结构化的、编码化的概念,随后为未来的疾病事件提供概率预测,如疾病、药物、症状和干预等。由于大部分EHR数据是文字形式,因此从颗粒和详细观察病人的角度得到一种好处,同时引入了微量的普通噪音。对于两个大型医院(King's Studio医院、南伦敦和Maudsley)的测试,可以很容易地进行,以GPT3为基础的新型管道,使用NER+L工具(即MedCAT)将文件文本转换成结构化的、编码化的概念,然后对未来医疗事件进行概率性预测,例如疾病、药物、症状和干预措施。由于大部分EHR数据是以文字形式出现,因此从颗粒子和详细观察,而引入一般的模型。对于两个大型医院的测试,对于可以进行快速的诊断,对于可以进行快速的测试,为了预测,为了预测而进行免费的精确的精确的精确的精确的模型,对于直径值的模型的模型,需要使用为0.810,对于50年和0.81和0.1和0.1和0.1和0.91的周期的周期的周期的预测,也用于进行。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日