项目名称: 基于集成异构网络的表型-基因关联挖掘研究
项目编号: No.61300166
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 谢茂强
作者单位: 南开大学
项目金额: 23万元
中文摘要: 表型-基因关联预测和分析在疾病治疗、食物增产等方面具有重大意义,是生物信息学中的核心问题,在全基因组网络上采用类似于PageRank的随机游走方法逐渐成为主流.但传统方法只能在单个基因网络上进行挖掘,在对表型网络数据、患者数据的利用中丢失很多结构信息.本课题将表型、基因和表型-基因关联三个网络集成为一个异构网络,以充分保留各网络中的结构信息,并在此基础上开展关联预测、聚类分析和同源模块挖掘工作:1)将表型-基因关联预测建模为优化问题,通过损失函数设计来利用已知关联和网络结构,改进优化方法以适应表型-基因关联稀少等问题;2)通过最大化表型聚类和基因聚类的一致性来实现对表型和基因的协同聚类,为复杂疾病提供表型簇-基因簇级别的聚类分析工具;3)提出跨物种表型-基因同源模块挖掘,将研究相对充分的家鼠上的成果引入到人类的表型-基因关联挖掘中.对于集成异构网络的研究也会推动大数据下的异构数据挖掘的发展
中文关键词: 表型-基因关联;集成异构网络;数据挖掘;跨物种分析;大数据
英文摘要: Predicting and analyzing phenotype-gene associations is the key problem in bioinformatics, since controlling phenotype by manipulating DNAs implies many essential applications such as disease treatment and food production. Recently, network-based algorithms were developed for mining potential phenotype-gene associations with the help of the global topology in biological networks. However, they can only prioritize or cluster candidate genes in single gene network with insufficent using of the topologies in disease phenotype network and phenotype-gene bipartite network. In our proposal, phenotype similarity network, gene network and phenotype-gene network are integrated into one heterogeneous network, in order to preserve the topologies of original networks. Based on it, following research will be conducted: (1)Modelling the prediction of phenotype-gene association as an optimization problem, in which relations between nodes of different types and sparseness of known phenotype-gene associations should be considered in loss function and related constraints. To solve it, an optimizing algorithm called bi-random walk will be proposed. It can take balance between walking in heterogeneous network and avoid the bias from the sparse associations(2) Designing a clustering analysis tool for complex genetic diseases, which
英文关键词: Phenotype-Gene Association;Integrated Heterogeneous Network;Data Mining;Cross-species Analysis;Big Data