项目名称: 化学建模中若干重要问题的基础研究
项目编号: No.21275164
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 数理科学和化学
项目作者: 梁逸曾
作者单位: 中南大学
项目金额: 80万元
中文摘要: 化学计量学和化学信息学研究的一个主要目标就在于建立起一个有效并可靠的化学模型以对未知的化学样本进行预测。目前,可用的化学计量学方法,如主成分回归,偏最小二乘,支撑向量机,人工神经网络,分类回归树等,已发展不少,但如何有效地对所得模型进行有效评价的研究却十分缺乏,目前还主要靠交叉效验来进行,但现已有很多学者指出,只靠交叉效验来进行模型评价有很大缺陷;另外,模型建立后,其可靠应用域如何定义也鲜有报道,对模型的可靠应用域如不能有效定义,则将严重影响该模型的可用性,是模型实际应用的主要障碍;此外,目前样本数远小于变量数的情况在光谱分析、代谢组学分析和模式识别中非常常见,这样多变量少样本的的情况是目前化学计量学、化学信息学和生物信息学的共同具有挑战性的难点问题。本研究将针对这几个十分重要的基本问题进行系统研究。
中文关键词: 化学计量学;化学信息学;模型评价;变量选择;应用
英文摘要: One of the main aims in chemometrics and Chemoinformatics is to build up an effective and reliable chemical model in order to predict the unknown samples.So far, there many chemometric algorithms, such as principal component regresssion, partial least squares, support vector machines, artificial neaclear networks, classification and regression trees(CART) and ect. However, there are rarely the methods for effective assessment of the obtained models. The cross-validation is the main tool for this purpose. However, there are lots of critiques on the cross-validation, especially on the leave-one-out-Cross-validation. The methods based on cross validation always too optimistic. Overmore, the domain of applicability of the obtained models is not well defined. Without the domain of applicability of the models, it is very difficult to really use the obtained models, which is now the main obstacle for the usage of the chemoiical models. Now, it is a common situation for chemical modeling, that is, the number of variables is much graeter than the number of the samples, which is a big challenge for chemival modeling, which is the open question in chemometrics, chemoinformatics and bioinformatics. The above mentioned important problems confronted in chemotrics will be systematically researched in this project.
英文关键词: Chemometrics;Cheinformatics;Model Assessment;Variable Selection;Application