项目名称: 代谢组学数据的多层次融合和模型评价方法研究
项目编号: No.21465016
项目类型: 地区科学基金项目
立项/批准年度: 2015
项目学科: 数理科学和化学
项目作者: 易伦朝
作者单位: 昆明理工大学
项目金额: 49万元
中文摘要: 随着仪器分析技术的飞速发展,高通量代谢组学数据的获得已变得不再困难。随之而来的是如何解决这些海量实际数据带来的诸多现实问题。高通量数据,一个显著的特点就是变量数远大于样本的数量。这一特点给数据融合和模型评价带来了一系列的困难。本课题拟在前期的代谢组学研究工作基础上,根据不同类型、不同层次的代谢组学数据融合要求,开发一系列化学计量学数据融合新算法和新策略。揭示不同数据集的数据特征以及数据集之间变量的内在变化规律,建立稳健的数据融合模型,为多中心大范围的代谢组学研究提供技术支撑。另一方面,针对模型评价这个化学计量学、化学信息学和生物信息学都十分关注的基础性问题,从模型参数的统计分布角度出发,寻找与评价模型的预测能力相关的关键因素,建立模型评价新方法和新指标。所建立的方法将应用于疾病的代谢组学研究中,用于建立有良好预测能力的疾病分类模型,为疾病的临床诊断和预后预测提供新的手段。
中文关键词: 化学计量学;代谢组学;数据融合;模型评价
英文摘要: With the rapid development of modern instrumental analytical technologies, it is not very difficult to obtain the high throughput metabolomics datasets any more. However, many new questions followed. A great challenge is how to deal with the practical problems coming with the massive actual datasets. A remarkable characteristic of high throughput datasets is that the number of variables is much bigger than that of the samples. Many difficulties will come out in data fusion and model evaluation because of this specific characteristic of dataset. In this project, a series of chemometric data fusion algorithms and strategies will be proposed aiming to deal with different types and different levels request of metabolomics data fusion, based on our previous metabolomics research works. These methods will applied to build up a robust data fusion model. Furthermore, we will try to reveal the features of datasets obtained from different sources and their inner change rules of variables. It will provide technical support for multi center, large range of metabolomics research. As we all know, model evaluation is a very important basic question for chemometrics, chemoinformatics and bioinformatics. It is also a key question in data processing of metabolomics. In this project, we will deal with this problem from a new angle. The statistical distributions of model parameters will employed to screen the key factors relating with the predictive ability evaluation of model. On this basis, some new methods and new indexes will be proposed for model evaluation. The new algorithms and strategies proposed in the project will be applied to disease metabolomics research. They will help us to build up disease pattern models having good predictive ability, which might be a complement or an alternative for clinical diagnosis and prognostic prediction.
英文关键词: chemometrics;metabolomics;data fusion;model evaluation