Disease risk prediction has attracted increasing attention in the field of modern healthcare, especially with the latest advances in artificial intelligence (AI). Electronic health records (EHRs), which contain heterogeneous patient information, are widely used in disease risk prediction tasks. One challenge of applying AI models for risk prediction lies in generating interpretable evidence to support the prediction results while retaining the prediction ability. In order to address this problem, we propose the method of jointly embedding words and labels whereby attention modules learn the weights of words from medical notes according to their relevance to the names of risk prediction labels. This approach boosts interpretability by employing an attention mechanism and including the names of prediction tasks in the model. However, its application is only limited to the handling of textual inputs such as medical notes. In this paper, we propose a label dependent attention model LDAM to 1) improve the interpretability by exploiting Clinical-BERT (a biomedical language model pre-trained on a large clinical corpus) to encode biomedically meaningful features and labels jointly; 2) extend the idea of joint embedding to the processing of time-series data, and develop a multi-modal learning framework for integrating heterogeneous information from medical notes and time-series health status indicators. To demonstrate our method, we apply LDAM to the MIMIC-III dataset to predict different disease risks. We evaluate our method both quantitatively and qualitatively. Specifically, the predictive power of LDAM will be shown, and case studies will be carried out to illustrate its interpretability.
翻译:在现代保健领域,特别是随着人工智能(AI)的最新进步,疾病风险预测已引起人们日益关注,特别是在现代保健领域,疾病风险预测领域,特别是在人工智能(AI)的最新进步方面。电子健康记录(EHRs)包含不同的病人信息,在疾病风险预测任务中广泛使用。应用AI风险预测模型的一个挑战在于生成可解释的证据以支持预测结果,同时保留预测能力。为了解决这一问题,我们建议了联合嵌入词和标签的方法,使关注单元根据与风险预测标签名称的相关性,从医疗说明中从医疗说明中从医疗说明中汲取了医学说明,从而从医疗说明性记录(EEHRs)中汲取了医学说明的词和标签的重量。这一方法通过使用临床-BERT(在大型临床材料上受过预先训练的生物医学说明语言模型)来将具有生物医学意义的特点和标签的词典的词典。这一方法提高了解释性。这一方法的可解释性,将联合嵌入时间序列数据的处理概念,并将多模式的学习框架用于处理诸如医学说明医学说明性说明等文字输入的医学数据的方法。我们所显示的MAMAARM的数据序列中的数据。