This paper proposes an algorithm named as PrTransH to learn embedding vectors from real world EMR data based medical knowledge. The unique challenge in embedding medical knowledge graph from real world EMR data is that the uncertainty of knowledge triplets blurs the border between "correct triplet" and "wrong triplet", changing the fundamental assumption of many existing algorithms. To address the challenge, some enhancements are made to existing TransH algorithm, including: 1) involve probability of medical knowledge triplet into training objective; 2) replace the margin-based ranking loss with unified loss calculation considering both valid and corrupted triplets; 3) augment training data set with medical background knowledge. Verifications on real world EMR data based medical knowledge graph prove that PrTransH outperforms TransH in link prediction task. To the best of our survey, this paper is the first one to learn and verify knowledge embedding on probabilistic knowledge graphs.
翻译:本文建议使用名为 PrTransH 的算法,从真实世界的EMR数据医学知识中学习嵌入矢量。 嵌入真实世界的EMR数据中医学知识图表的独特挑战是,知识三重的不确定性模糊了“ 正确的三重” 和“ 错误的三重” 之间的界限,改变了许多现有算法的基本假设。为了应对这一挑战,对现有TransH 算法作了一些改进,包括:1) 涉及医学知识三重的概率,以培训目标为三重;2) 以统一的损失计算取代基于边际的排名损失,同时考虑到有效和腐败的三重;3) 增加具有医学背景知识的培训数据集。对真实世界的EMR数据基于医学知识图的核查证明,PRtransH在链接的预测任务中超越了TransH。对我们的调查的最佳评估是,本文件是第一个学习和核实将知识嵌入概率知识图中的知识。