Deep learning models have shown tremendous potential in learning representations, which are able to capture some key properties of the data. This makes them great candidates for transfer learning: Exploiting commonalities between different learning tasks to transfer knowledge from one task to another. Electronic health records (EHR) research is one of the domains that has witnessed a growing number of deep learning techniques employed for learning clinically-meaningful representations of medical concepts (such as diseases and medications). Despite this growth, the approaches to benchmark and assess such learned representations (or, embeddings) is under-investigated; this can be a big issue when such embeddings are shared to facilitate transfer learning. In this study, we aim to (1) train some of the most prominent disease embedding techniques on a comprehensive EHR data from 3.1 million patients, (2) employ qualitative and quantitative evaluation techniques to assess these embeddings, and (3) provide pre-trained disease embeddings for transfer learning. This study can be the first comprehensive approach for clinical concept embedding evaluation and can be applied to any embedding techniques and for any EHR concept.
翻译:深层学习模式在学习表现方面显示出巨大的潜力,能够捕捉到数据的某些关键特性,从而使他们成为学习转移的绝佳人选:利用不同学习任务之间的共同点将知识从一个任务转移给另一个任务;电子健康记录研究是越来越多的深层学习技术领域之一,用于学习具有临床意义的医疗概念(如疾病和药物)的表达方式。尽管取得了这一增长,但衡量和评估这种学习表现(或嵌入)的方法仍然调查不足;如果分享这种嵌入来便利转移学习,这可能是一个大问题。在本研究中,我们的目标是(1) 培训一些最突出的疾病嵌入技术,用于310万病人的全面的 EHR数据,(2) 使用定性和定量评估技术来评估这些嵌入过程,(3) 提供预先训练的疾病嵌入用于转移学习。这项研究可能是临床概念嵌入评价的第一个全面方法,可以应用于任何嵌入技术和任何EHR概念。