项目名称: 基于概率声管模型的单通道语音分离研究
项目编号: No.61473168
项目类型: 面上项目
立项/批准年度: 2015
项目学科: 其他
项目作者: 欧智坚
作者单位: 清华大学
项目金额: 83万元
中文摘要: 单通道语音分离本质上是一个欠定问题。基于模型的方法是单通道语音分离研究的重要方向。尽管取得了一定成功,目前这些方法中使用的语音模型仍存在严重缺陷-属于对语音的不完整建模。语音的基本物理模型-声管模型,告诉我们语音的三个基本物理量-声管激励、激励增益和声道响应,及它们之间关系。但语音界一直缺乏一个真正能联合这三个基本量的概率模型,来刻划语音随机性。本项目提出概率声管模型,并运用到基于模型的单通道语音分离。其主要思想是,通过显式表述语音产生过程中的诸物理量,并对诸量如何一起作用产生语音进行概率化描述,建立语音的产生式模型。新模型将克服目前语音模型的不完整性的缺陷,为求解欠定的单通道语音分离问题提供更好的约束;同时新模型作为产生式模型,可以很自然结合高层知识,从而支持在语音分离中象图式驱动的听觉场景分析一样,实现自下而上和自上而下的信息双向流动。这些新举措有望带来单通道语音分离研究的新突破。
中文关键词: 语音分离;计算听觉场景分析;语音处理
英文摘要: Single-channel speech separation is essentially an underdetermined problem. Model-based approach is an important research direction for single-channel speech separation. Although with some success, current speech models used in these methods are still seriously flawed due to their incomplete modeling of speech. The basic physical model of speech - acoustic tube model, tells us that there are three basic physical parameters - the excitation function, the excitation gain and vocal tract response, and how they are interacted to generate speech. But for a long time, we lack a unified probabilistic model to integrate the three fundamental speech parameters to describe the randomness of speech. In this project, the probabilistic acoustic tube (PAT) model is proposed and applied to model-based single-channel speech separation. The main idea is to explicitly encode the physical parameters and describe how they are interacted to generate speech in probabilistic terms. The new model will overcome the current shortcoming of incomplete modeling, and provide better constraints for solving the underdetermined single-channel speech separation problem. Moreover, as a generative model, the new model can naturally incorporate high-level knowledge and realize two-way flow of information (bottom-up and top-down) for speech separation, like schema-driven auditory scene analysis. These new ideas are expected to bring a new breakthrough to the single-channel speech separation research.
英文关键词: Speech Separation;Computational Auditory Scene Analysis;Speech Processing