Electronic Health Records (EHRs) hold detailed longitudinal information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Temporal modelling of this medical history, which considers the sequence of events, can be used to forecast and simulate future events, estimate risk, suggest alternative diagnoses or forecast complications. While most prediction approaches use mainly structured data or a subset of single-domain forecasts and outcomes, we processed the entire free-text portion of EHRs for longitudinal modelling. We present Foresight, a novel GPT3-based pipeline that uses NER+L tools (i.e. MedCAT) to convert document text into structured, coded concepts, followed by providing probabilistic forecasts for future medical events such as disorders, medications, symptoms and interventions. Since large portions of EHR data are in text form, such an approach benefits from a granular and detailed view of a patient while introducing modest additional noise. On tests in two large UK hospitals (King's College Hospital, South London and Maudsley) and the US MIMIC-III dataset precision@10 of 0.80, 0.81 and 0.91 was achieved for forecasting the next biomedical concept. Foresight was also validated on 34 synthetic patient timelines by 5 clinicians and achieved relevancy of 97% for the top forecasted candidate disorder. Foresight can be easily trained and deployed locally as it only requires free-text data (as a minimum). As a generative model, it can simulate follow-on disorders, medications and interventions for as many steps as required. Foresight is a general-purpose model for biomedical concept modelling that can be used for real-world risk estimation, virtual trials and clinical research to study the progression of diseases, simulate interventions and counterfactuals, and for educational purposes.
翻译:虽然大多数预测方法主要使用结构化数据或单一数据预测和预测并发症的子集,但我们为纵向建模处理了EHR的全部自由文本部分。我们展示了一种基于GPT3的新型管道,该管道使用NER+L工具(即MedCAT)将文件文本转换成结构化的、编码化的概念,随后为未来的疾病事件提供概率预测,如疾病、药物、症状和干预等。由于大部分EHR数据是文字形式,因此从颗粒和详细观察病人的角度得到一种好处,同时引入了微量的普通噪音。对于两个大型医院(King's Studio医院、南伦敦和Maudsley)的测试,可以很容易地进行,以GPT3为基础的新型管道,使用NER+L工具(即MedCAT)将文件文本转换成结构化的、编码化的概念,然后对未来医疗事件进行概率性预测,例如疾病、药物、症状和干预措施。由于大部分EHR数据是以文字形式出现,因此从颗粒子和详细观察,而引入一般的模型。对于两个大型医院的测试,对于可以进行快速的诊断,对于可以进行快速的测试,为了预测,为了预测而进行免费的精确的精确的精确的精确的模型,对于直径值的模型的模型,需要使用为0.810,对于50年和0.81和0.1和0.1和0.1和0.91的周期的周期的周期的预测,也用于进行。