项目名称: 基于元基因组相似度计算的海量微生物群落数据挖掘
项目编号: No.61303161
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 苏晓泉
作者单位: 中国科学院青岛生物能源与过程研究所
项目金额: 23万元
中文摘要: 元基因组学是分析微生物群落结构和功能的最重要手段之一。高通量测序技术的发展以及群落样本的指数级增加,产生了海量的元基因组数据。由于目前研究中缺乏微生物群落之间高效的比较与分析方法,对元基因组数据利用率低,无法从海量的数据中获取宝贵的生物学信息。本项目基于元基因组数据的相似度对海量微生物群落的结构信息进行数据挖掘,同时结合相应群落的采样环境信息,从而发现导致微生物群落结构差异的主要环境因素。微生物群落之间的相似度由计算元基因组数据加权二叉进化树的相似性来获得,并采用GPGPU CUDA架构并行化计算海量微生物群落的相似度矩阵。通过相似度矩阵中不同样本的环境条件差异性分析,自然聚类分析,以及聚类结果与环境条件的相关性分析等数据挖掘方法,量化地计算环境条件对微生物群落结构造成的影响。本项目同时也为元基因组的大数据分析提供基础方法和经验。
中文关键词: 微生物群落;元基因组;高性能计算;数据挖掘;微生物组
英文摘要: Metagenomic method is one of the most important methods to analysis the structure and function of microbial communities. The development of the NGS technology and the exponentially increasing number of microbial community samples produced massive metagenomic data. Limited by the lack of efficient analysis and comparison methods among microbial communities and low utitlize rate of metagenomic data,currently we cannot obtain valuable biological information from the massive data.This project aims to find the principal environmental factors which lead to the structural difference of microbial communities, by data mining methods based on the metagenomic similarity computing and the environmental information. The similarity between microbial communities is generated by the similarity computation of the weighted binary phylogenetic tree of metagenomic data, and then GPGPU CUDA architecture is implemented for parallel computing of the similarity value matrix of massive microbial community samples. By environmental difference and clustering analysis of the similarity value matrix, and correlation analysis between the clustering analysis results and the environmental factors, we can quantitively compute the diversity among the microbial communities occurred by environmental factors, and then realize the environmental fact
英文关键词: Microbial community;Metagenome;High performance computing;Data mining;Microbiome