Objective: Disease knowledge graphs are a way to connect, organize, and access disparate information about diseases with numerous benefits for artificial intelligence (AI). To create knowledge graphs, it is necessary to extract knowledge from multimodal datasets in the form of relationships between disease concepts and normalize both concepts and relationship types. Methods: We introduce REMAP, a multimodal approach for disease relation extraction and classification. The REMAP machine learning approach jointly embeds a partial, incomplete knowledge graph and a medical language dataset into a compact latent vector space, followed by aligning the multimodal embeddings for optimal disease relation extraction. Results: We apply REMAP approach to a disease knowledge graph with 96,913 relations and a text dataset of 1.24 million sentences. On a dataset annotated by human experts, REMAP improves text-based disease relation extraction by 10.0% (accuracy) and 17.2% (F1-score) by fusing disease knowledge graphs with text information. Further, REMAP leverages text information to recommend new relationships in the knowledge graph, outperforming graph-based methods by 8.4% (accuracy) and 10.4% (F1-score). Conclusion: REMAP is a multimodal approach for extracting and classifying disease relationships by fusing structured knowledge and text information. REMAP provides a flexible neural architecture to easily find, access, and validate AI-driven relationships between disease concepts.
翻译:目标:疾病知识图表是连接、组织和获取关于疾病的不同信息的一种方法,对人工智能有多种好处(AI)。为了创建知识图表,有必要以疾病概念之间的关系和使概念和关系类型正常化的形式从多式联运数据集中提取知识。方法:我们引入了REMAP, 一种用于疾病关系提取和分类的多式联运方法。REMAP 机器学习方法将部分、不完整的知识图表和医学语言数据集联合嵌入一个紧凑的潜在矢量空间,随后将多式联运嵌入功能与最佳疾病关系提取相匹配。结果:我们用REMAP方法将疾病知识图表与96,913个关系和124万个句子的文本数据集相匹配。在由人类专家附加说明的数据中,REMAP将基于文本的疾病关系提取方式改进了10.0%(准确性)和17.2%(F1-核心),方法是将疾病知识图表与文本信息信息信息信息信息引入一个链接。此外,REMAP利用文本信息在知识图表中建议新的关系,以8.4%(准确性)和10.4 %(F-核心)的文本关系中,通过结构化的获取模式和AMAPAF-IMA 提供一种快速信息架构。