项目名称: 编码先验约束的高维小样本数据处理方法的研究
项目编号: No.61271385
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 无线电电子学、电信技术
项目作者: 韩飞
作者单位: 江苏大学
项目金额: 75万元
中文摘要: 传统的高维小样本数据处理方法将面向知识的符号学习与面向数据的统计学习对立,因而其处理性能不高、可解释性差。本项目结合先验约束运用粒子群优化(PSO)和极端学习机(ELM)从数据层、模型层和算法层上对高维小样本数据的处理进行研究。首先,用统计和聚类分析方法提取高维小样本数据中蕴含的特征分布和功能等先验信息(约束)。其次,用多种策略将先验约束编码进PSO进行特征选择。再次,将PSO与编码先验约束的混合投票方法相结合建立集成ELM模型。最后,在数据和模型的基础上,编码先验约束提高各ELM的性能。本项目以高维小样本的基因表达谱数据为研究对象,在对其处理中检验完善提出的方法。由于编码了问题中的先验约束,本课题的研究不但能提高高维小样本数据处理精度和速度,还大大增强机器学习的透明性。该课题为与机器学习有关的应用基础研究,它的深入研究必将给智能信息处理等领域带来新的发展,并促进国民经济其它行业的发展。
中文关键词: 先验信息;高维小样本数据;粒子群优化;极端学习机;
英文摘要: There is a chasm between symbolic learning working with knowledge and statistical learning working with data in traditinal methods of high-dimensional and small sample size data processing, which results in worse processing performance and interpretability of the traditional processing methods for high-dimensional and small sample size data. This project analyzes and studies high-dimensional and small sample size data from three layers, data, model and algorithm, by incorporating priori constraints into particle swarm optimization (PSO) and extreme learning machine (ELM). To begin with, some priori informations (constraints) related to feature distribution and function behind high-dimensional and small sample size data are extracted by using statistical and cluster methods. Then, PSO encoding the priori constraints with different strategies is used to perform feature selection of high-dimensional and small sample size data. Thirdly, a ensemble ELM model is established by combining PSO with hybrid voting coupling the priori constraints. Finally, based on the above data and model, the performance of the individual ELM in the ensemble model is improved by encoding the priori constraints. The project mainly studies high-dimensional and small sample size gene express profile, and tests and perfects the proposed met
英文关键词: prior information;high-dimensional and small sample size data;particle swarm optimization;extreme learning machine;