项目名称: 信号的稀疏、低维、精简性表达方法及其在语音处理系统中的应用研究
项目编号: No.61471205
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 无线电电子学、电信技术
项目作者: 吴大雷
作者单位: 南京邮电大学
项目金额: 85万元
中文摘要: 本项目研究探索一种基于数据稀疏性、降维及精简性表达方法,以期构建绿色系统的方法,并研究将其应用到多种语音处理系统的可行性及合理性。本提案工作主要分为两个部分:理论研究和语音信号系统应用研究部分。在第一部分,我们重点研究三种基于稀疏性的新型算法:(1)新型的稀疏性主特征子空间分析算法。并着重研究其在去噪上的确定性;(2)新型的稀疏性可区别线性子空间分析算法,并创造出一套全新的具有唯一性和确定性的可区别性子空间分析方法的理论框架。(3)新型的稀疏性多维高斯混合模型建模方法,并研究其基于最大概率估计的训练学习算法。在第二部分,我们将主要研究将上述三种方法应用于各种语音系统的可行性。其中包括:(1)稀疏性语音增强系统, 研究稀疏性主特征子空间分析法, 及稀疏性高斯混合模型法; (2) 稀疏性说话人识别系统。 研究稀疏性可区别子空间分析法, 及稀疏性高斯混合建模法。(3)稀疏性语音识别系统。
中文关键词: 降维处理;压缩感知;语音增强;说话人识别;语音识别
英文摘要: This proposal is focused on studying some sparse and low dimensional representation methods in speech signal processing domain. With the proposed sparse and low dimensional representation methods, the requirements of energy consumption of speech signal processing systems can be reduced to a large extent, thus making the systems efficient in power and providing benefits for environment protection. Based on this idea, in this proposal, we propose to investigate three sparse representation methods for speech signal processing, i.e., sparse principal component analysis (PCA), sparse linear discriminant analysis (LDA), and spare Gaussian mixture models (GMM). Our research efforts in this project will contribute to the technological advance in both the theoretical development and the application development. The proposal work is divided into two parts: theory and applications. In the first part, we shall contribute to: (1) investigating the robust property of sparse PCA in denoising; (2) establishing a novel framework of sparse LDA, which is efficient to estimate a unique discriminant subspace, as well as investigating its robust property in denoising; (3) developing the theory of a novel sparse GMM model, including the establishment of its maximum likelihood estimation (MLE) algorithm to update the mean vectors as well as the covariance matrices of the components of GMM simultaneously; analyzing its convergence property, such as the optimal convergence rate, the minimax convergence rate, as well as the upper bound of its estimation errors. In the second part, we shall contribute to: (1) investigating the applications of sparse PCA, sparse LDA and sparse GMMs to speech enhancement systems; (2) investigating the applications of these three theoretical methods to speaker recognition systems as well as other such applications, such as speech recognition, etc.
英文关键词: dimension reduction;compressive sensing;speech enhancement;speaker recognition;speech recognition