项目名称: 大规模高分辨质谱数据挖掘新方法研究
项目编号: No.21305163
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 数理科学和化学
项目作者: 张志敏
作者单位: 中南大学
项目金额: 25万元
中文摘要: 高分辨质谱在结构鉴定中起着非常重要的作用,但是从大规模的气质或液质数据中通过预处理和模式识别挖掘出有判别能力的标记物,然后利用高分辨质谱对其鉴定仍是复杂体系分析的关键和难点之一。目前主要采用预处理方法与模式识别,鉴定则依赖于库检索。目前预处理方法耗时且主观性大以及谱库覆盖范围有限,因此需要新的预处理、模式识别与鉴定等方法。本项目在高性能计算平台支撑下,实现高分辨质谱自动基线校正、峰检测、多元分辨和校准等方法,可快速从联用数据中挖掘用于模式识别的二维矩阵;采用随机森林和稀疏线性判别分析等方法识别出标记物;对无法用质谱库进行鉴定的标记物,通过高分辨质谱精确质量、质谱校准、同位素丰度、PubChem数据库、保留指数以及理论裂解规律等方法进行定性分析。项目成功实施将为复杂体系高分辩质谱数据提供更好分析与挖掘方法,对目前几个研究热点,如代谢组学、食品安全、天然药物活性成分等领域有很强的现实意义。
中文关键词: 化学计量学;高分辨质谱;数据挖掘;高性能计算;液质联用
英文摘要: High-resolution mass spectra(HRMS) plays an important role in structure elucidation. However, the mining of discriminant markers from large-scale GC-MS or LC-MS dataset and the identification of them via HRMS are still difficult for most researchers. Presently the markers are often discovered by manual preprocessing and pattern recognition, then identified by searching MS libraries. This procedure is time-consuming and subjective, and the spectra in MS libraries are limited. So some novel methods for preprocessing, pattern recognition and identification are needed urgently. In this project, we will implement baseline-fittig, peak detection, automatic deconvolution and alignment methods to construct 2D matrix for pattern recognition and corresponding HRMS for structure elucidation based on high performance computing techniques. Then random forests or sparse linear discriminant analysis will be employed to discover the influential markers effectively. For the markers not including in the MS libraries, accurate m/z values, mass spectra calibration, isotopic abundance, PubChem database, retention index and in silico fragmentation will be adopted for molecular formula and structure identification. This study can provide a novel and systemic platform for analyzing and mining HRMS dataset of complex system, which is me
英文关键词: Chemometrics;High resolution mass spectra;Data mining;High performance computing;Liquid chromatography–mass spectrometry