项目名称: 面向人群分类的基因组序列多态性分析的研究
项目编号: No.60871092
项目类型: 面上项目
立项/批准年度: 2009
项目学科: 金属学与金属工艺
项目作者: 王春宇
作者单位: 哈尔滨工业大学
项目金额: 32万元
中文摘要: “#38754;向人群分类的基因组序列多态性分析的研究”#65288;编号:60871092)为自然科学基金委员会面上项目,研究期限为3年。本项目目标是依据图论算法和人工智能理论,针对单核苷酸多态性(SNP)的生物测试规模大、差异位点影响人群结构推断这两个问题开展研究,重点旨在SNP候选位点和标签SNP位点挖掘算法、基于SNP位点和层次聚类的人群结构推断研究等方面取得一些理论和实验成果,为开发实用的生物信息学原型软件系统奠定坚实的理论基础与先进的技术基础。 本项目在广泛调研基因组序列多态性信息分析方法国内外进展的基础上,把工作重点放在tagSNP挖掘的有效算法和人群结构推断的层次聚类方法研究上。在基于参数过滤和集成学习的SNP候选位点挖掘、基于聚类和图模型的tagSNP位点获取算法、疾病人群的线粒体DNA分析、人群基因型序列距离矩阵表示和基于层次聚类的人群结构推断算法等方面取得一系列创新性成果,并开发了相应的软件以方便对上述各种算法和策略进行研究。 本项目共发表文章24篇,其中SCI收录7篇,EI收录11篇,ISTP收录1篇,获省级奖1项。毕业博士生、硕士生共6人。
中文关键词: 基因组序列多态性;单核苷酸多态性;标志SNP;图模型与聚类算法;人群分类
英文摘要: The project, Research of Genome Sequence Polymorphism Analysis for Population Classification (No. 60871092) which was supported by NSFC General Program funding with a period of 3 years, manages to slove the large-scale biology test and diversity site effect in population structure inference with single nucleotide polymorphism(SNP) in Bioinformatics based on graph algorithm and AI theory. It has undertaken some fundamental theoretical and experimental research about the selecting algorithm of candidate SNPs and tagSNPs, population structure inference based on SNPs and hierarchical clustering method. It will benefit developing the practical Bioinformatics prototype software in term of solid theoretical ground and advanced technical suport. Based on a sweeping investigation of advances and trends in the information processing method about the genome sequence polymorphism research home and aboard, and reviewing of new characteristucs of SNPs, the project focuses on the effecting algorithms of mining tagSNPs from EST sequences and the hierarchical clustering method about population structure inference. This project obtained a series of innovation results in many aspects including mining candidate SNPs using parameter filters and ensemble classifiers, tagSNP selection algorithm based on hybrid clustering and graph model, mtSNPs analysis for disease population discrimination, distance-based matrix presentation in population genotype data and the population structure inferring algorithm based on hierarchical clustering. Also, a corresponding software platform has been developed to facilitate estimation of all kind of propsoed algorithms and schemes. With the project going, 24 papers have been published, among which 7 were searched by SCI, 11 by EI and 1 by ISTP. Based on the project, 6 have received their doctoral degrees or master ones.
英文关键词: genome sequence polymorphism; single nucleotide polymorphism(SNP); tagSNPs; graph model and clustering algorithm; population classifying