Background: The learning of genotype-phenotype associations and history of human disease by doing detailed and precise analysis of phenotypic abnormalities can be defined as deep phenotyping. To understand and detect this interaction between phenotype and genotype is a fundamental step when translating precision medicine to clinical practice. The recent advances in the field of machine learning is efficient to predict these interactions between abnormal human phenotypes and genes. Methods: In this study, we developed a framework to predict links between human phenotype ontology (HPO) and genes. The annotation data from the heterogeneous knowledge resources i.e., orphanet, is used to parse human phenotype-gene associations. To generate the embeddings for the nodes (HPO & genes), an algorithm called node2vec was used. It performs node sampling on this graph based on random walks, then learns features over these sampled nodes to generate embeddings. These embeddings were used to perform the downstream task to predict the presence of the link between these nodes using 5 different supervised machine learning algorithms. Results: The downstream link prediction task shows that the Gradient Boosting Decision Tree based model (LightGBM) achieved an optimal AUROC 0.904 and AUCPR 0.784. In addition, LightGBM achieved an optimal weighted F1 score of 0.87. Compared to the other 4 methods LightGBM is able to find more accurate interaction/link between human phenotype & gene pairs.
翻译:在将精密医学转化为临床实践时,要理解和检测苯型和基因型之间的这种互动是一个重要的步骤。在机器学习领域最近的进展对于预测非正常人类苯型和基因之间的相互作用是有效的。方法:在本研究中,我们开发了一个框架,通过对胎儿型肿瘤学和基因进行详细和精确的分析,预测人类疾病的历史。来自混杂知识资源(e.)的注解数据被定义为深层口味。为了理解和检测苯型和基因型之间的这种相互作用,在将精密医学转化为临床实践实践时,使用一种叫做 node2vec 的算法来生成节点。根据随机行道对这个图进行节点取样,然后学习这些抽样节点的特征来生成嵌入。这些嵌入式用于执行下游任务,以预测这些混杂知识资源(e.e.oorganet) 的精确互动数据用于分析人类苯型和基因型之间的关联。 使用5种不同的监督性G 机头BL 算算法显示一个最优的模型和BR 。