项目名称: 基于深层神经网络的多模态快速稀疏表征器
项目编号: No.61473219
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 自动化技术、计算机技术
项目作者: 王进军
作者单位: 西安交通大学
项目金额: 82万元
中文摘要: 从多媒体数据中提取具有稀疏特性的特征表达尽管能够取得较好的语义分析效果,其仍具有计算量过大、难以支持多模态联合表征两个主要困难。由于求解稀疏约束条件的高度非线性,开发新的数学优化与融合算法突破上述限制变得越来越困难。课题组拟结合人类视觉系统在对自然图像信息表达过程中具有稀疏性这一生理学基础,采用人工深层神经网络模型开发快速稀疏表征器。该表征器支持以前馈方式直接快速计算输入信号的稀疏编码,实现特征提取速度超过一个数量级的提高。同时允许从多模态数据中进行跨模态对齐的稀疏表征,能够支持包括静态图像到动态视音频的大规模多媒体数据中常见的模态缺失、不同步、时空分辨率不匹配等复杂情况下的有效特征表达。在此基础上,衍生出一系列新颖的应用,包括对模态缺失状况的最优化处理、数据的模态无关性本征表达、抽象特征的可视化等功能,从而极大促进稀疏表征方法面向大规模复杂多模态数据的广泛应用。
中文关键词: 稀疏表示;特征提取;神经网络;多模态
英文摘要: Despite the recent progress in semantic multimedia analysis based on sparse feature representation, extracting sparse feature is still a time consuming process and is difficult to apply to multi-modalities situation. Due to the high non-linearity of the sparse-coding process, developing new mathematic algorithms to overcome the above two limitations is getting very difficult. Alternatively, in this proposal, based on the solid biological evidence that the human vision system also adopts sparse feature representation for natural visual data, we plan to develop deep artificial neural network based sparse feature representor to tackle the above dilemma. The representor solves the sparse-coding problem by performing simple feed-forward calculation, such that the feature representation process can achieve order-of-magnitude speed-up. Meanwhile, the representor allows multi-modality input by aligning and fusing different modality into joint sparse representation, such that it can handle some complex scenarios in typical large-scale image and video processing problems, including modality deficiency, off-synchronization, different spatio-temporal resolution, etc. Based on the developed representor, many novel applications can be well supported, such as modality deficiency robust feature representation, modality-invariant intrinsic feature extraction, visualization of low-level features, etc, which will dramatically improve the popularity of sparse feature representation technology in real-world, large-scale multimedia computing applications.
英文关键词: Sparsity;Feature Representation;Neural Network;Multimodality