Learning Electronic Health Records (EHRs) representation is a preeminent yet under-discovered research topic. It benefits various clinical decision support applications, e.g., medication outcome prediction or patient similarity search. Current approaches focus on task-specific label supervision on vectorized sequential EHR, which is not applicable to large-scale unsupervised scenarios. Recently, contrastive learning shows great success on self-supervised representation learning problems. However, complex temporality often degrades the performance. We propose Graph Kernel Infomax, a self-supervised graph kernel learning approach on the graphical representation of EHR, to overcome the previous problems. Unlike the state-of-the-art, we do not change the graph structure to construct augmented views. Instead, we use Kernel Subspace Augmentation to embed nodes into two geometrically different manifold views. The entire framework is trained by contrasting nodes and graph representations on those two manifold views through the commonly used contrastive objectives. Empirically, using publicly available benchmark EHR datasets, our approach yields performance on clinical downstream tasks that exceeds the state-of-the-art. Theoretically, the variation on distance metrics naturally creates different views as data augmentation without changing graph structures.
翻译:学习电子健康记录(EHRs)代表是一个突出但未得到充分发现的研究课题,它有利于各种临床决策支持应用,例如药品结果预测或病人相似性搜索。目前的方法侧重于对矢量化连续EHR进行特定标签监督,这种监督不适用于大规模、无监督的情景。最近,对比学习表明在自我监督的代表学习问题上取得了巨大成功。然而,复杂的时间性往往会降低业绩。我们提议在EHR的图形代表上采用自我监督的图形内核学习方法,即Kernel Infomex,以克服以前的问题。与目前的情况不同,我们不改变图形结构以构建扩大的视角。相反,我们使用核心子空间放大将节点嵌入两种不同几何级的多元观点中。整个框架通过对比常用的对比点和图形表达方式,通过常用的对比目标,对这两种多元观点进行培训。我们的方法利用公开的基准EHR数据集,在超越了当前状态的下游任务上产生业绩,而不产生不同的图像。