项目名称: 基于稀疏时频分析与二元掩蔽估计的耳语音可懂度增强研究
项目编号: No.61301295
项目类型: 青年科学基金项目
立项/批准年度: 2014
项目学科: 无线电电子学、电信技术
项目作者: 周健
作者单位: 安徽大学
项目金额: 24万元
中文摘要: 耳语是一种能量极低的特殊发音方式,其传递的信息易受噪声干扰而被掩蔽。传统语音增强方法无法提高耳语音可懂度,而现有的基于机器学习的二元掩蔽方法仍有不足。本项目研究噪声环境下通过去噪提高耳语音可懂度的单通道语音增强方法。该项目在我们前期工作观察到稀疏时频谱有助于提高耳语音可懂度基础上,拟于稀疏联合时频域,探索通过估计时频块的二元掩蔽值提取语音能量为主的时频块,进而利用这些稀疏时频块稳定重建增强后的耳语音的相关理论和技术。主要研究内容包括: 以过抽样实值离散Gabor时频分析为基础,研究欠抽样实值离散Gabor变换及展开理论,解决欠抽样时信号稳定重建难题,从而建立耳语音稀疏时频谱表示模型;为了克服基于有监督机器学习的二元掩蔽值估计方法的缺点,本项目还将利用卷积非负矩阵分解理论研究基于稀疏时频谱表示的无监督二元掩蔽值学习方法,最终获得可懂度得到大幅度提高的干净耳语音。
中文关键词: 耳语音;语音可懂度;卷积非负矩阵分解;二元掩蔽估计;非对称代价函数
英文摘要: Whisper is a special voicing style with very low energy, and the conveyed information is easily concealed by noise in an adverse environment. The conventional speech enhancement algorithms, however, do not improve the intelligibility of the enhanced speech, the supervised machine learning based binary mask estimation methods also have some disadvantages. This project studies single channel speech enhancement method which aims to improve the intelligibility of the whisper in noise environment. Based on the previous work in which we find that sparse time-frequency spectrum is beneficial to whisper intelligibility improvement, this project explores theories and technologies of extracting speech energy dominated time-frequency unit through estimating the binary mask of each time-frequency unit and then reconstructing the enhanced whisper from these sparse time-frequency units in the joint sparse time-frequency domain. Major research contents include: based on oversample real-valued discrete Gabor time-frequency analysis, studying the under sample real-valued discrete Gabor transform and expansion theories to solve the stable signal reconstruction problem, and thereafter build the sparse time-frequency spectrum representation model of whisper; in order to overcome the defect of the binary mask estimation method whic
英文关键词: Whisper;Speech intelligibility;Convolution non-negative matrix factorization;Binary mask estimation;Asymmetric cost function