项目名称: 基于高维非线性方法胃癌血清蛋白质谱数据整合平台的建立及评价
项目编号: No.30872957
项目类型: 面上项目
立项/批准年度: 2009
项目学科: 金属学与金属工艺
项目作者: 黄建
作者单位: 浙江大学
项目金额: 34万元
中文摘要: SELDI-TOF-MS为一高通量的检测方法,研究证明其在许多肿瘤包括胃癌中都能提高诊断效能,同时也发现以下三个问题:1)重复性较差;2)数据预处理中的信息丢失;3)模型有过拟合。故其各环节的质控和标准化成为亟待解决的重要问题。在我们建立了完善的实验前、中操作规范后,分析方法的标准化则是关键。常规数据分析方法主要有SVM、PCA+LDA、决策树、神经网络等,但均存在缺陷。SELDI-TOF质谱图的非线性特征标志如不稳定、非平衡、无序和非一致性,我们采用分形分维、小波分析、模式聚类及多种数据挖掘技术等多种非线性方法,通过简化数据预处理过程,减少信息丢失,得到更准确的结果来证实该方法的可行性。通过校正,模型检验效果提升为敏感性111/150、特异性111/150,平均74%,交叉验证时,胃癌诊断模型预测特异性85%、敏感性90%。进一步应用高维非线性方法并结合临床病理参数,发现了与胃癌相关的血清标志(专利授权并发表SCI论文),开发可准确识别不同时间和实验室提供的胃癌样本且具有自主知识产权的血清蛋白质谱分析软件(软件著作权),为临床应用打下基础。此外,我们组建了多学科团队,培养数名研究生。
中文关键词: 非线性方法;胃癌;血清蛋白质组学;SELDI-TOF-MS
英文摘要: SELDI-TOF-MS technology, as one of high-throughput detection methods, can improve the diagnostic performance in many tumors including gastric cancer, according to the literature and our previous study, but it also has some shortages as follow: 1) Poor reproducibility of experimental results; 2 )Information lost in data pre-processing; 3) over-fitting problem. Quality control and standardization of the approach have particularly become an urgent problem. Even though a standard experiment was already established in our previous research, the standardization of analytical method(s) for data mining therefore became critical. At present, there are several conventional analysis methods for data processing, such as SVM, PCA LDA, decision tree, and neural networks, but all have limitations. We here applied a novel high-dimensional non-linear approach due to the nonlinear characteristics of SELDI-TOF mass spectra like instability, unbalance, disorder and non-uniformity. These analytical methods included the fractal dimension, wavelet analysis, and a variety of data clustering techniques, which could simplify the process of data preprocessing, reduce the loss of information, and get more accurate results. By adjustment, the model test effect has been improved with sensitivity 111/150(74%) and specificity 111/150(74%). The gastric cancer diagnosis model has predicted with specificity 85% and sensitivity 90% during cross test. We then used the high-dimensional non-linear approach for gastric cancer samples and their clinicopathological data. Through the efforts of our multidiscipline team, we identify a novel serum marker (patented and SCI paper published) for prognosis prediction and also developed software for data processing from different times and labs (patented). In conclusion, the reproducibility of SELDI-TOF data from different times and different labs had been greatly improved, gastric cancer could be diagnosed and validated properly by our non-linear approaches.
英文关键词: nonlinear method; gastric cancer; serum proteomics; SELDI-TOF-MS