The data available in Electronic Health Records (EHRs) provides the opportunity to transform care, and the best way to provide better care for one patient is through learning from the data available on all other patients. Temporal modelling of a patient's medical history, which takes into account the sequence of past events, can be used to predict future events such as a diagnosis of a new disorder or complication of a previous or existing disorder. While most prediction approaches use mostly the structured data in EHRs or a subset of single-domain predictions and outcomes, we present MedGPT a novel transformer-based pipeline that uses Named Entity Recognition and Linking tools (i.e. MedCAT) to structure and organize the free text portion of EHRs and anticipate a range of future medical events (initially disorders). Since a large portion of EHR data is in text form, such an approach benefits from a granular and detailed view of a patient while introducing modest additional noise. MedGPT effectively deals with the noise and the added granularity, and achieves a precision of 0.344, 0.552 and 0.640 (vs LSTM 0.329, 0.538 and 0.633) when predicting the top 1, 3 and 5 candidate future disorders on real world hospital data from King's College Hospital, London, UK (\textasciitilde600k patients). We also show that our model captures medical knowledge by testing it on an experimental medical multiple choice question answering task, and by examining the attentional focus of the model using gradient-based saliency methods.
翻译:电子健康记录(EHRs)中的数据为改变护理提供了机会,而改善对患者护理的最佳方式是学习所有其他患者的现有数据。考虑到过去事件的顺序,对患者的医疗史进行时间建模,可以用来预测未来事件,如诊断新的疾病或以前或现有疾病并发症;虽然大多数预测方法大多使用EHRs中的结构化数据,或单体数预测和结果的子集,但我们展示了一种新型的基于实体识别和链接工具(即MedCAT)的变异器,用于构建和组织EHRs的自由文本部分,并预测未来一系列医疗事件(最初的疾病)。由于EHR数据的大部分是文字形式,这种方法得益于患者的颗粒和详细观察,同时引入了微量的噪音;MedGPT有效地处理噪音和添加的颗粒性模型,并实现了0.344、0.552和0.640(使用LSTM 0.329、0.538和0.633)的梯度输精度管道,用于构建和组织EHRs的自由文本部分,并预估测未来一系列的英国大学的医学测试数据,也就是5号,从而预测了我们大学的医学测试1号、英国大学和5xexexexmexex。