项目名称: 有效融合多源异构数据的集成分类器研究
项目编号: No.61503253
项目类型: 青年科学基金项目
立项/批准年度: 2016
项目学科: 其他
项目作者: 何丽芳
作者单位: 深圳大学
项目金额: 21万元
中文摘要: 多源异构数据分类技术是近年来数据挖掘和机器学习领域的研究重点和热点,在网页分类、文本分类、脱机手写体字符识别、基于内容的图像和视频检索、生物信息处理等领域有着广泛的应用。然而由于先验知识的缺乏,如何构建有效融合多源数据中互补信息和相关信息的泛化模型是当前尚未解决的重要科学问题。鉴于此,本项目拟在集成学习的理论框架下展开以下三个方面研究:.针对有监督分类问题,建立特征选择与分类器优化耦合的支持向量-张量机集成模型;.针对半监督分类问题,建立特征选择与分类器优化耦合的半监督支持向量-张量机集成模型;.针对非线性分类问题,设计基于向量-张量复合模式的非线性多核函数,并构造特征选择与分类器优化耦合的学习算法。.项目旨在揭示模式表达影响数据分类的本质规律,提出针对多源异构数据分类关键难题的解决方案,为该方法在相关领域的应用奠定理论基础和技术基础,为研究以集成学习为代表的的机器学习算法开拓新的理论视角。
中文关键词: 多视图学习;集成学习;半监督学习;张量分析;多核学习
英文摘要: With the expansion of the application of classification analysis, the classification of multi-source heterogeneous data has recently received a significant amount of attention in the fields of data mining and machine learning. However, due to the lack of prior knowledge, it is still challenging to effectively integrate the complementarity and correlation among multi-view features to classification analysis. Motivated by this scientific problem, there are three main themes within the proposed research based on ensemble learning theory: (1) Bulid the support vector-tensor machine ensemble models for supervised classification problems via joint feature selection and classifier design; (2) Bulid the semi-supervised support vector-tensor machine ensemble models for semi-supervised classification problems via joint feature selection and classifier design; (3) Design the nonlinear multi-kernel based on vector-tensor compound pattern and joint feature selection and (semi-supervised) support vector machine for nonlinear classification problems. What is of significance in this proposal will be not only building some support vector-tensor machine models and designing more algorithms for various applications, but also making the research contents of data mining and machine learning richer and promoting research and development of machine learning and mathematical theory. Expected outcomes of the proposed research will provide solutions for the critical problems of the classification of multi-source heterogeneous data, and lead to novel techniques and fundamental theoretical basis for their applications in various fields, as well as provide a new theoretical perspective for ensemble learning based algorithms.
英文关键词: Multi-view Learning;Ensemble Learning;Semi-supervised Learning;Tensor Analysis;Multiple Kernel