To date, there are no effective treatments for most neurodegenerative diseases. Knowledge graphs can provide comprehensive and semantic representation for heterogeneous data, and have been successfully leveraged in many biomedical applications including drug repurposing. Our objective is to construct a knowledge graph from literature to study relations between Alzheimer's disease (AD) and chemicals, drugs and dietary supplements in order to identify opportunities to prevent or delay neurodegenerative progression. We collected biomedical annotations and extracted their relations using SemRep via SemMedDB. We used both a BERT-based classifier and rule-based methods during data preprocessing to exclude noise while preserving most AD-related semantic triples. The 1,672,110 filtered triples were used to train with knowledge graph completion algorithms (i.e., TransE, DistMult, and ComplEx) to predict candidates that might be helpful for AD treatment or prevention. Among three knowledge graph completion models, TransE outperformed the other two (MR = 13.45, Hits@1 = 0.306). We leveraged the time-slicing technique to further evaluate the prediction results. We found supporting evidence for most highly ranked candidates predicted by our model which indicates that our approach can inform reliable new knowledge. This paper shows that our graph mining model can predict reliable new relationships between AD and other entities (i.e., dietary supplements, chemicals, and drugs). The knowledge graph constructed can facilitate data-driven knowledge discoveries and the generation of novel hypotheses.
翻译:迄今为止,大多数神经退化性疾病都没有有效的治疗方法。知识图表可以提供各种数据的全面和语义代表,并在许多生物医学应用中成功地加以利用,包括药物再定位。我们的目标是从文献中建立一个知识图表,研究阿尔茨海默氏病(AD)与化学品、药物和饮食补充剂之间的关系,以便确定防止或推迟神经退化发展的机会。我们通过SemMedDB收集了生物医学说明,并用SemRep 提取了它们的关系。我们在数据处理前处理过程中使用了基于BERT的分类器和基于规则的方法来排除噪音,同时保留了大多数与AD有关的语义三重体。1 672,110过滤的三重力用于用知识图表完成算法(即TransE,Distmult Mult,和ComplEx)来培训可能有助于AD治疗或预防的候选者。在三个知识图表完成模型中,TransE优于其他两种(MR=13.45, Hits@1=0.306)。我们利用时间分析技术模型来进一步评估与ADribilal-dealmainal rial rial estal estal resmessal),以便进一步评估我们最可靠的预测结果。我们所发现和最可靠的数据,我们所测算出最可靠的数据。我们最可靠地显示。