基于图示的人类基因类型和基因之间的联系预测 (Graph Based Link Prediction between Human Phenotypes and Genes)

Background: The learning of genotype-phenotype associations and history of human disease by doing detailed and precise analysis of phenotypic abnormalities can be defined as deep phenotyping. To understand and detect this interaction between phenotype and genotype is a fundamental step when translating precision medicine to clinical practice. The recent advances in the field of machine learning is efficient to predict these interactions between abnormal human phenotypes and genes. Methods: In this study, we developed a framework to predict links between human phenotype ontology (HPO) and genes. The annotation data from the heterogeneous knowledge resources i.e., orphanet, is used to parse human phenotype-gene associations. To generate the embeddings for the nodes (HPO & genes), an algorithm called node2vec was used. It performs node sampling on this graph based on random walks, then learns features over these sampled nodes to generate embeddings. These embeddings were used to perform the downstream task to predict the presence of the link between these nodes using 5 different supervised machine learning algorithms. Results: The downstream link prediction task shows that the Gradient Boosting Decision Tree based model (LightGBM) achieved an optimal AUROC 0.904 and AUCPR 0.784. In addition, LightGBM achieved an optimal weighted F1 score of 0.87. Compared to the other 4 methods LightGBM is able to find more accurate interaction/link between human phenotype & gene pairs.

翻译：在将精密医学转化为临床实践时,要理解和检测苯型和基因型之间的这种互动是一个重要的步骤。在机器学习领域最近的进展对于预测非正常人类苯型和基因之间的相互作用是有效的。方法:在本研究中,我们开发了一个框架,通过对胎儿型肿瘤学和基因进行详细和精确的分析,预测人类疾病的历史。来自混杂知识资源(e.)的注解数据被定义为深层口味。为了理解和检测苯型和基因型之间的这种相互作用,在将精密医学转化为临床实践实践时,使用一种叫做 node2vec 的算法来生成节点。根据随机行道对这个图进行节点取样,然后学习这些抽样节点的特征来生成嵌入。这些嵌入式用于执行下游任务,以预测这些混杂知识资源(e.e.oorganet) 的精确互动数据用于分析人类苯型和基因型之间的关联。使用5种不同的监督性G 机头BL 算算法显示一个最优的模型和BR 。

相关内容

链路预测

关注 14

网络中的链路预测(Link Prediction)是指如何通过已知的网络节点以及网络结构等信息预测网络中尚未产生连边的两个节点之间产生链接的可能性。这种预测既包含了对未知链接（exist yet unknown links）的预测也包含了对未来链接（future links）的预测。该问题的研究在理论和应用两个方面都具有重要的意义和价值。

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

超越三元组:基于超关系知识图谱嵌入的链接预测，Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction

专知会员服务

78+阅读 · 2020年5月11日

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日