项目名称: 包含重复序列的基因预测及其功能分析
项目编号: No.61272250
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 韦朝春
作者单位: 上海交通大学
项目金额: 80万元
中文摘要: 重复序列区域占人类基因组总长度的50%以上,包含蛋白质编码基因,疾病相关位点以及其他功能因子。现有研究表明,人类基因组个体差异的绝大部分在重复序列区域,其中拷贝数变异(CNV)区域占70%以上。目前已经有很多基于CNV区域基因的疾病研究。这些研究都是以参考基因数据库Refseq为标准进行的。然而由于CNV区域包括大量重复序列,CNV区域的Refseq注释非常不完整,其原因是现有的基因预测系统需要先将重复序列区域屏蔽,然后再进行预测。 本项目将针对重复序列区域特点开发一个基因预测系统,对包括人类基因组在内的重复序列区域进行精细化的基因预测和分析。根据本项目前期研究基础,估计会预测到几百个没有包括在现有数据库的人类基因。本项目将挑选其中的部分基因验证其存在性并进一步分析其功能。本项目将生成一个更完整的参考基因集合,对基于CNV区域基因的疾病研究以及重复序列的功能研究具有比较重要的意义。
中文关键词: 重复序列;拷贝数变异;基因预测;可变剪接;RNA-seq
英文摘要: More than 50% of the human genome are repeat regions. These regions contain protein-coding genes, disease associated loci and other functional elements.The majority of the differences between human individuals locate in the repeat regions, of which more than 70% are copy number variation regions(CNV regions). There are many disease researches based on genes in CNV regions. These researches were carried out based on the reference gene set Refseq. However, annontated genes in CNV regions are far away from completed in Refseq database. Part of the reason is that CNV regions contain a large number of repeat regions, while the current gene prediction systems need to mask the repeat regions first before gene prediction. A novo gene prediction system is proposed for genomic regions with repeat regions. We will use this system for genomes including the human genome. We will validate experimentally tens new predicted human genes and analyze their functions. This project will generate a more complete reference gene set, and it is important for inherited diseases researches based on genes around CNV regions and for the functional anaysis of repeat regions.
英文关键词: repetitive regions;copy number variation;gene prediction;alternative splicing;RNA-seq