项目名称: 基于质谱词典思想的谱库设计及理论谱预测研究
项目编号: No.31270834
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 生物科学
项目作者: 孙世伟
作者单位: 中国科学院计算技术研究所
项目金额: 70万元
中文摘要: 基于质谱技术的序列鉴定,是蛋白质组学的重要工具。现有的序列库搜索技术与de novo技术,受限于理论谱预测的精度;谱库技术理论上能避免理论谱预测的困难,但谱库收录数目的有限性往往导致查询失败。 本课题采用"质谱词典"策略以克服上述困难。我们首先研究肽段片段断裂模式的保守性,将具有保守断裂模式的肽段片段收录于质谱词典;其次,对于未收录的肽段片段,依据"移动质子"假说,构建统计模型以预测其局部理论谱;最后,对于待查询质谱,先依据质谱词典中断裂模式标注出其可能的肽段片段,进而将各个标注组合成完整肽段。 此策略的优势在于:即使待查询质谱作为一个整体未收录于谱库中,其局部质谱仍有可能已收录于肽段片段的词典中。 初步结果表明:肽段短片段具有较强的断裂模式保守性;小规模的肽段片段词典即可标注绝大部分质谱;统计模型能够高精度地预测出肽段片段的理论谱。本项研究有助于提高谱库方法的准确性,扩展其应用范围。
中文关键词: 蛋白质组学;质谱;大数据;理论谱;
英文摘要: Tandem mass spectrum technique has emerged as one of the most effective technique for protein sequence identification. Both the database-searching technique and de novo technique suffer from the low accuracy in theoretical spectrum prediction. Theoretically speaking, the spectra datatbase technique escapes from this difficulty; however, the limited size of known spectra in a spectra database usually leads to failure when searching for a query spectrum. The study aims to circumvent this difficulty via using "spectra dictionary" tehcnique. Specifically, we first investigate the conservation of framentation pattern for a peptide segments, and gather the segments with convered fragmentation patterns to yield a spectra dicitionary. Of course, there is still possibility that a segment was not archived in the dicitionary. For these segments, a statistical model is proposed to predict their fragmentaton pattern according to the "mobile proton" hypothesis. Finally, the query spectrum will be annotated with peptide segment candidates via searching the dictionary; the full-length peptide sequence will be combined through these segment candidates. Preliminary experimental results suggest that: 1) peptide segments usually demonstrate converved fragmentatin pattern; 2) nearly all spectra can be explained even using a small
英文关键词: proteomics;mass spectrometry;big data;theoritical spectrum;