项目名称: 方差正则化的分类模型选择方法研究
项目编号: No.61503228
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 其他
项目作者: 王钰
作者单位: 山西大学
项目金额: 18万元
中文摘要: 在生物信息, 图像处理等领域中, 如何利用统计机器学习方法选择一个合适的模型是进行模式分类的前提和关键. 传统的机器学习中往往直接基于性能度量指标的估计本身进行模型的选择, 但是这样的方法显然没有考虑估计的波动性从而有可能选择到错误的模型. 特别地, 近年来统计显著性检验方法被引入通过对照两个分类模型性能的差异来选择一个更优模型, 检验的方法虽然添加了方差信息但它依赖于数据的分布假定且进行多个模型中的两两模型对照时计算开销非常大, 不适合直接用于多个模型的选择. 基于上述分析, 我们考虑提供一个广泛使用的交叉验证框架下的方差正则化分类模型选择方法. 本项目的研究主要包括: (1)提供性能度量指标的交叉验证估计的准确合适的方差估计;(2)基于现有的交叉验证分类模型选择方法和提出的方差估计构造融合正则化方差的分类模型选择方法;(3)理论和实验分析证明它的优越性和可行性.
中文关键词: 模型选择;方差;正则化;分类
英文摘要: In areas such as Bioinformatics and Image Processing, how to use statistical machine learning method to select a right model is the premise and key of pattern classification. In traditional machine learning, model selection is always directly performed based on the estimation of performance measure index. However, these methods obviously do not take into account the variance of the estimation, and thus a wrong model may be selected. In particular, statistical significance test is introduced to select a better model by comparing the difference of the performances of two classification models in recent years. Although the variance information is added to the test method, it relies on the assumption of data distribution. And the computational cost is very large when performing the comparisons between any two models in multiple models, which is not suitable for direct use in the selection of multiple models. Based on the above analysis, we give a method of classification model selection based on variance regularization in a widely used cross validation framework. This study includes that (1) the exact and right variance estimation of the cross validated estimation of performance measure index is provided; (2) method of classification model selection integrating regularized variance is constructed based on the existing method of classification model selection with cross validation and the proposed variance estimation; (3) the superiority and feasibility are proved by theoretical and experimental analysis.
英文关键词: Model selection;Variance;Regularization;Classification