项目名称: 面向复杂数据基于流形学习的非线性降维算法研究
项目编号: No.61305069
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 自动化技术、计算机技术
项目作者: 陈静
作者单位: 广东工业大学
项目金额: 20万元
中文摘要: 复杂数据集是一种未经过数据预处理的原始数据集。稳健且实用的降维方法的缺乏,已成为严重制约从复杂数据中提取有用信息的重要因素,成为机器学习、数据挖掘和模式识别等领域急待解决的关键问题。本项目以复杂数据集降维所面临的难题为切入点,针对复杂数据集刻画对象特征的指标众多而采样数据可能位于高曲率流形之上,数据集内存在大量的局外点和噪声,部分数据包含标签信息等特殊问题开展研究,使用排除局外点的多流形学习、最小化噪声的流形切空间表示、半监督学习、自适应参数选择等方法,研究一种稳健的基于半监督流形学习的非线性降维算法。本项目的难点和关键问题是数值稳定的流形曲率测度的构建和半监督拓扑约束等距嵌入算法的研究。该项目的成功实施将使非线性降维算法可以应对复杂数据可能面临的各种特殊问题,从而使非线性降维的理论研究具有更好的实际意义。
中文关键词: 流形学习;半监督学习;非线性降维;;
英文摘要: Complex data sets contain raw data without preprocessing. The lack of robust and practical methods for dimensionality reduction makes it very difficult to extract useful information from complex data. It is also the key issue needed to be resolved in areas such as machine learning, data mining, pattern recognition, and so on. The project considers the difficulties of dimensionality reduction for complex data and will conduct researches around the following issues: the intrinsic dimension of complex data is high meanwhile data may be sampled from high curvature manifold; there are a large number of outliers and noise; part of the data contains label information. The project will use the methods including muti-manifold learning method with removing outliers, manifold tangent space representation with minimizing noise, semi-supervised learning, adaptive parameter selection, and so on, and will propose a robust nonlinear dimensionality reduction algorithm based on semi-supervised manifold learning. The difficulties and the key issues of this project are numerically stable measure of manifold curvature and semi-supervised topological constraints isometric embedding algorithm. The successful implementation of the project will enable nonlinear dimensionality reduction algorithm better deal with complex data sets, so th
英文关键词: Manifold learning;Semi-supervised learning;Nonlinear dimensionality reduction;;