系统生物学中组学数据分析的若干问题研究

项目名称： 系统生物学中组学数据分析的若干问题研究

项目编号： No.11271374

项目类型： 面上项目

立项/批准年度： 2013

项目学科： 数理科学和化学

项目作者： 许青松

作者单位： 中南大学

项目金额： 60万元

中文摘要： 本课题研究将以湘雅医学院卫生部肿瘤蛋白质组学重点实验室提供数据和我们自己实验获得的高通量的血浆代谢组数据为基础，辅助利用美国生物信息学中心的Pubchem数据库，欧洲生物信息学中心的UniProt蛋白质数据库和日本京都大学建立的KEGG代谢网络数据库等, 针对目前系统生物学组学复杂数据的分析和处理新方法进行系统深入的研究，其中特别某些疾病代谢特异性生物标记物的筛选, 建立可靠具有代表性的健康人和病人的数学判别模型。发展基于可靠独立筛选以及模式分布分析的重要变量筛选的学习方法；研究疾病的代谢组与蛋白组相互关系，整合、挖掘、统计分析不同组学的数据信息，寻找健康人蛋白组与代谢组之间的相关规律。本项目将完成组学的复杂数据分析、生物标记物的筛选和建模的系列方法，为组学研究者提供有效的数据分析手段，为临床诊断提供新的途径，促进现代统计学习方法在生命科学研究中的应用。

中文关键词： 蛋白质组学；代谢组学；生物标志物；变量选择；统计学习

英文摘要： With the fast development of systems biology, a great challenge to omics (such as proteomics and metabonomics) is to mine and analyze the high dimensional data from modern instruments like MALDI－TOF MS, 2-D LC/MS/MS and high resolution NMR spectrometer etc. Based on the data from our experiments, the Key Lab Canc Prote in Central South University and other public libraries, (such as Pubchem of NIH, UniProt of Europ and KEGG of Japan), we are going to make an intensive study on statistical methods to deal with the high dimensional proteomic and metabonomic data. Our aim is to develop the novelty statistical learning methods to mine these complex data. 1) Based on the sure independence screening (SIS), we are going to develop some new stable methods for variable selection. The variables selected are expected to be more reliable and easier to be interpreted. Moreover, further efforts will be put on presenting the novelty methods to identify the biomarkers from these selected variables. 2) To study the methods to establish the model to recognize the pattern of diseases, and also to find biomarkers/biomarker. The model will be built on -omic data and clinical data. The differnce among the models will be studied. 3) We are going to apply canonical correlation technique to propose the new methods to explore the relatio

英文关键词： Proteomics；Metabolomics；Biomarker；Variable selection；Statistical learning

成为VIP会员查看完整内容