项目名称: 高维数据的几何结构分析
项目编号: No.61272341
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 林宙辰
作者单位: 北京大学
项目金额: 81万元
中文摘要: 当今是高维和海量数据的时代,如何快速有效地处理高维数据是一个巨大挑战。高维数据的分布非常复杂,几何结构分析是分析高维数据的重要方法,因为数据的几何结构蕴涵了数据的聚类和分类信息。本项目针对现有方法的一些不足之处,利用稀疏表示、半黎曼几何、核方法等数学工具,研究鲁棒的线性或非线性多子流形分解的数学模型,并基于半黎曼几何推广流形学习的理论与方法,以解决感知距离(流形上距离)小于欧氏距离的问题。为刻画数据分布的稀疏性,本项目进一步研究保持范数可分解性和促结构稀疏性的运算,以及高阶稀疏性的可计算的度量。这是目前稀疏表示理论的关键问题。本项目还研究相应的快速算法,尤其是低复杂度的随机算法,及其GPU实现,以解决处理高维数据时计算上的困难。
中文关键词: 稀疏表示;低秩表示;子空间聚类;流形学习;一阶优化
英文摘要: Nowadays we are facing with high dimensional and huge amount of data. How to process high dimensional data is a big challenge. The distribution of high dimensional data is very complicated. Geometric structural analysis is an important method to analyze high dimensional data, because the geometric structure of data implies the clustering and classification information of data. Aiming at addressing some drawbacks of existing methods, this project utilizes several mathematical tools, e.g., sparse representation, semi-Riemannian geometry, and kernel method, to investigate the mathematical models that can decompose linear or nonlinear multi-submanifolds robustly, and generalize the manifold learning theories and methods based on semi-Riemannian geometry, in order to address the issue of perceptual distance (manifold distance) being smaller than Euclidean distance. To characterize the sparsity in data distribution, this project further explores the operations that can preserve the decomposability and structural sparsity inducibility of norms, as well as the computable high-order sparsity measure, which are the key problems of the current sparse representation theories. Finally, this project studies the corresponding fast algorithms, especially the low complexity randomized algorithms, and their implementations on GPU
英文关键词: sparse representation;low-rank representation;subspace clustering;manifold learning;first order optimization