项目名称: 基于海量样本的高性能元基因组数据分析策略和方法开发
项目编号: No.31271410
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 生物科学
项目作者: 宁康
作者单位: 华中科技大学
项目金额: 80万元
中文摘要: 基于新一代DNA测序技术的元基因组方法是认识微生物群落结构和功能的最重要手段之一。但是,首先,目前元基因组数据量呈爆炸式上升趋势;其次,元基因组数据在类型、来源、质量上十分复杂;最后,基于元基因组数据的不同科学问题,对其分析方法提出了广谱性、多功能与高速度等挑战。这些挑战和困难的存在,导致目前对于元基因组数据解析仍处于经验化阶段,缺乏系统、可靠、可参照的分析方法。针对该瓶颈,本项目通过选择海量元基因组样本,系统考察群落的数据类型(16S rRNA等进化标记或全基因组测序)、序列类型(454或Solexa)以及测序深度等若干因素对基于元基因组数据的不同科学问题分析策略的影响。进而运用数据挖掘等方法,总结上述因素影响机制的共性、特性和互相影响,提出具一定参照意义、代表性与通用性的技术参数矩阵。最终开发通用的元基因组数据分析方法,并建立相关分析策略参数和支撑数据的数据库,服务于元基因组研究。
中文关键词: 元基因组;海量样本;高性能分析;物种和功能结构;功能注释
英文摘要: Metagenome based on next-generation DNA sequencing technology is one of the most important means of understanding the structure and function of microbial community. However, firstly, the amount of metagenomic samples was increasing rapidly; secondly, metagenomic datasets have different type, source and quality; thirdly, different biological questions need the analysis method to be multi-fucntional, high-speed and adaptive to different data. As such, current metagenomics data analysis is still dependent on human experience, and there is urgent needs for systematic, reliable and standard metagenomic data analytical method. To trackle this bottleneck, this study would select a large number of metagenomic samples, and systematically investigate the effect of varies factors that could affect the analysis strategy and results for metagenomic data analysis, such as the complexity structure (taxonomy and function), data type (evolutionary marker such as 16S rRNA, or whole genome sequencing), sequence type (454 or Solexa) and sequencing depth, etc. Then by using data mining methods, we will summarize the effect of the above factors on the results of metagenomic data analysis, and propose a biologically meaningful and representative parameter matrix that could be applied on different metagenomic data analysis. Finally
英文关键词: Metagenomics;Large-scale samples;High-performance analysis;Taxonomical and functional structure;Functional annotation