Background The learning of genotype-phenotype associations and history of human disease by doing detailed and precise analysis of phenotypic abnormalities can be defined as deep phenotyping. To understand and detect this interaction between phenotype and genotype is a fundamental step when translating precision medicine to clinical practice. The recent advances in the field of machine learning is efficient to predict these interactions between abnormal human phenotypes and genes. Methods In this study, we developed a framework to predict links between human phenotype ontology (HPO) and genes. The annotation data from the heterogeneous knowledge resources i.e., orphanet, is used to parse human phenotype-gene associations. To generate the embeddings for the nodes (HPO & genes), an algorithm called node2vec was used. It performs node sampling on this graph based on random walks, then learns features over these sampled nodes to generate embeddings. These embeddings were used to perform the downstream task to predict the presence of the link between these nodes using 5 different supervised machine learning algorithms. Results: The downstream link prediction task shows that the Gradient Boosting Decision Tree based model (LightGBM) achieved an optimal AUROC 0.904 and AUCPR 0.784. In addition, LightGBM achieved an optimal weighted F1 score of 0.87. Compared to the other 4 methods LightGBM is able to find more accurate interaction/link between human phenotype & gene pairs.
翻译:通过详细和精确地分析胎儿异常现象,学习基因型同和人类疾病历史的背景。通过详细和精确地分析基因型同和人类疾病的历史,可以将不同知识资源(例如,孤儿)的批注数据定义为深层次的口交。要理解和检测这种芬型同基因型之间的相互作用,这是将精密医学转化为临床实践的一个基本步骤。在机器学习领域最近的进展对于预测异常人类芬型同基因之间的相互作用十分有效。在这个研究中,我们开发了一个框架,用来预测人类苯型本体与基因之间的联系。来自混杂知识资源(例如,孤儿)的批注数据可用于分析人类苯型同基因型同和基因型之间的相互作用。为了生成节点(HPO & 基因),使用了一种叫做 node2c 的算法。根据随机行走对这个图进行节点取样,然后学习这些抽样节点的特性,以产生嵌入。这些嵌入用于执行下游任务,以预测这些节点之间的精确链接,即,即,使用5个不同的甚低位的血型同级的血型同级的基因型同级关系。