项目名称: 基于无限混合模型的大数据降维及其在信息检索中的应用
项目编号: No.61472304
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 其他
项目作者: 王秀美
作者单位: 西安电子科技大学
项目金额: 80万元
中文摘要: 数据降维是解决维数灾难的有效途径,相关技术的发展对当前机器学习和计算机视觉等领域有着重要意义。然而,降维算法在处理当前大数据时,面临样本分布非高斯、非均匀以及样本之间相互依赖关联复杂等问题。为了能对大数据进行有效处理,本项目拟以无限混合模型、概率论、图论、优化等数学理论为基础,充分利用贝叶斯推理、隐变量结构、变分推理等方法,提出可以处理复杂数据的非参数降维算法:首先利用无限混合模型模拟大数据的多模态、异构分布特性,提出基于非参数贝叶斯推理的降维模型;其次,为了能对多源数据进行联合处理,实现数据的高效检索,设计了基于隐变量结构的生成式模型方法,找寻多源数据的内在结构相似性,进一步,针对找到的低维数据,进行哈希变换,生成二值编码,实现多源数据的快速检索;最后,提出了基于变分近似推理的模型推断方法,实现对非参数目标函数的优化。该研究成果将为面向大数据的挖掘和识别领域提供新思路和新方法。
中文关键词: 数据降维;无限混合模型;隐变量结构;哈希函数;变分推理
英文摘要: Dimensionality reduction is an effective way to solve the curse of dimensionality. The development of the DR plays an important role in machine learning and computer vision. However, the traditional dimensionality reduction algorithms cannot satisfy the requirements of the big data. The distribution of the samples is non-Gaussian or non-uniform, and relationships between samples are much complex. In order to deal this kind of dataset more effectively, the project attempt to establish the DR model based on some basic theory, such as, the infinite mixture models, probability theory, graph theory and mathematical optimization theory. At the same time, Through making full use of Bayesian inference, the latent variable structure and variational inference, the project will build the non-parametric data dimensionality reduction model for above samples. Firstly, the DR model will be proposed based on infinite mixture model to deal with multi-modal and heterogeneous dataset. Secondly, an important requirement for processing multiple content modalities is the development of sophisticated joint models for evaluating the similarity and divergence between different modalities, and particularly important is the development of generative graphical models that can find the low-structure with respect to content in multiple modalities, then design the hashing function which can project the low-structure to the binary codes for fast retrieval. Finally, the variational approximation inference model based approach will be proposed to optimize the hyper-parameters of objective function. The research results will provide new ideas and new methods for dealing with complex dataset in data mining and recognition.
英文关键词: dimensionality reduciton;infinite mixture model;latent variable structure;hashing function;variational inference