项目名称: 高维不平衡数据的集成学习算法研究
项目编号: No.11526161
项目类型: 专项基金项目
立项/批准年度: 2016
项目学科: 数理科学和化学
项目作者: 殷清燕
作者单位: 西安建筑科技大学
项目金额: 3万元
中文摘要: 实际应用中的数据集通常具有特征维数高和类分布不平衡双重特性,这些特性为高维不平衡数据的有效分类带来了极大挑战。集成学习利用多个基分类器的集成来解决同一分类问题,在提高分类器泛化能力和稳健性方面具有显著优势。本项目将以高维不平衡数据的有效分类为研究目标,深入分析现有集成学习算法在解决此类问题存在的不足,综合运用不平衡数据处理机制、先进的降维技术,探索其与基于特征子空间的集成学习算法的有机结合点,设计适用于高维不平衡数据的集成学习算法,并将这些算法应用于生物信息学中的基因表达数据分析和蛋白质结构预测等问题。该研究不仅为高维不平衡数据提供有效的分类算法,也将为解决相关的实际应用问题提供新技术和新方法,具有十分重要的科学意义和应用前景。
中文关键词: 集成学习;高维不平衡数据分类;特征选择;微阵列数据分析;
英文摘要: Data sets in practical application are usually characterized by high dimensions and imbalanced class distribution, it brings great challenges for effective classification of high-dimensional imbalanced data sets. Ensemble learning using multiple classifiers to solve the same problem, has a significant advantage in improving the classifier generalization and robustness. This project will revolve high-dimensional imbalanced classification problems, and integrate imbalanced data preprocessing mechanism, advanced dimension reduction technology and ensemble learning to design effective classification algorithms. At last, we will apply newly designed algorithms to resolve gene expression data classification and protein structure prediction problems in bioinformatics. The study not only provides effective high-dimensiona imbalanced classification algorithms, but also lay the foundation for solving practical problems in application. Hence, it has very important scientific significance and application prospects.
英文关键词: ensemble learning;high-dimensional imbalanced data classification;feature selection;microarray data analysis;