The paper researches the problem of concept and patient representations in the medical domain. We present the patient histories from Electronic Health Records (EHRs) as temporal sequences of ICD concepts for which embeddings are learned in an unsupervised setup with a transformer-based neural network model. The model training was performed on the collection of one million patients' histories in 6 years. The predictive power of such a model is assessed in comparison with several baseline methods. A series of experiments on the MIMIC-III data show the advantage of the presented model compared to a similar system. Further, we analyze the obtained embedding space with regards to concept relations and show how knowledge from the medical domain can be successfully transferred to the practical task of insurance scoring in the form of patient embeddings.
翻译:论文研究了医疗领域的概念和病人陈述问题。我们把电子健康记录(EHRs)中的病人记录作为ICD概念的时间序列,在未经监督的设置中学习嵌入的ICD概念,采用以变压器为基础的神经网络模型。模型培训是在6年内收集100万病人历史的6年中进行的。这种模型的预测力与若干基线方法相比较得到评估。关于MIMIC-III数据的一系列实验表明,与类似系统相比,所呈现的模式具有优势。此外,我们分析了在概念关系方面获得的嵌入空间,并展示了如何成功地将医疗领域的知识转移到以病人嵌入形式进行保险评分的实际任务上。