蛋白组学中LC-MS/MS数据的统计分析的新型算法研究

项目名称： 蛋白组学中LC-MS/MS数据的统计分析的新型算法研究

项目编号： No.20875104

项目类型： 面上项目

立项/批准年度： 2009

项目学科： 生物科学

项目作者： 梁逸曾

作者单位： 中南大学

项目金额： 30万元

中文摘要： 本项目将主要针对目前蛋白组学中的LC-MS/MS数据的统计分析存在的问题，结合本研究小组的长处，对蛋白组学数据处理新算法进行深入研究。拟采用机器学习的方法，结合质谱解析算法，在对大量的LC-MS/MS数据进行统计分析的基础上，建立用于联用色谱数据的预处理、多肽质谱的理论预测、LC-MS/MS数据与多肽的匹配、多肽与蛋白质的匹配等进行系统研究。在此基础上，还将采用Bayes统计分析思路，构建对所有匹配结果假阳性率的估计的新型统计分析方法，通过对数据进行系统的分析，达到精确的对生物样本中的蛋白质进行定性定量分析目的，从而为全景的揭示蛋白组的生物学功能奠定基础，加深人类对生命活动的本质的认识。通过对NIST多肽质谱库中含有脯氨酸的质谱进行数据挖掘，发现脯氨酸在多肽的中间位置或与亮氨酸，异亮氨酸键联时，也具有强的裂解选择性，由此建立的氨基酸裂解图为多肽的质谱预测提供了强的实验基础。通过对杂排离子的挖掘，发现其对多肽的定性具有强的影响，从而可将杂排离子应用与多肽质谱库搜索中，以提高多肽定性的准确性。

中文关键词： 蛋白质组学；生物信息学；数据挖掘；LC-MS/MS数据；化学计量学

英文摘要： The aim of the project is to focus our attention on the difficult problems in proteinomics data from LC-MS/MS, combining the advantage of our research group on elucidation of mass spectra and chemometrics, to develop new algorithms in order to improve the accuracy of identification of peptides and proteins. The machine learning methods developed in biological statistics, such as CART, CARS, boosting and so on, were used to mine the information from the large amount of LC-MS-MS data. Firstly, An investigation of more than 130 000 tandem mass spectra of proline-containing peptides extracted from Human and Ecoli peptide libraries in NIST Libraries was conducted.In order to quantitatively characterize the fragmentation behavior of proline, probability of occurring selective cleavage at N-terminal side of Pro for each node point is calculated. From the diagram bifurcations, cleavage at N-terminal side of Pro is significantly influenced by proton mobility in peptides and requires proton locates at its site.When protons are mobile, cleavage at N-terminal side of Pro is determined by pairwise cleavage Xxx-Pro and positions of proline in peptides. Other fragmentation pathways influencing the fragmentation such as aspartic acid effect and yN-2/b2 pathwayyN-2 rule are also found. Also, an investigation of more than 390 000 high quality CID mass spectra was mied to explore the extent of scrambled ions in tandem mass spectra and the fragmentation rules during scramble reactions of b and a ions. Preferential re-opening sites are found for aliphatic residues Ala, Ile, Leu and other residues such as Met, Qln, Ser, Phe and Thr, whereas disfavored sites are found for basic residues Arg, Lys and His, and residues with large bulk of side chains such as Trp for both scrambled b and a ions. In the reaction of losing internal residues, similar preferential order of 20 residues to re-opening reaction was found when cleavage occurs at C-terminal side. However, when cleavage occurs at N-terminal side of 20 residues, Glu, Phe and Trp become the most preferential sites. Nevertheless, the basic residues are still the disfavored sites. These results provide a deep insight into cleavage rules during scrambled reactions for prediction of peptide mass spectra. Also, an investigation of whether scrambled could help discriminate false identifications from correct identifications is performed.

英文关键词： Data Mining; Proteomics; LC-LC-MSdata;Chemometrics; Bioinformatics

成为VIP会员查看完整内容