项目名称: 融合语音产生机理与统计声学建模的层次化语音合成方法研究
项目编号: No.61273032
项目类型: 面上项目
立项/批准年度: 2013
项目学科: 自动化技术、计算机技术
项目作者: 凌震华
作者单位: 中国科学技术大学
项目金额: 80万元
中文摘要: 语音合成是智能人机交互领域的一项关键技术,对合成语音所体现话者、音色、情感等特性的灵活控制是语音合成技术的一个重要发展方向。在青年科学基金项目中,我们首次将发音动作参数引入到统计参数语音合成中,利用发音动作参数与语音产生机理的直接相关性,取得了对合成语音音色与元音发音方式的有效控制。本项目旨在以实现语音学知识驱动下语音合成的高可控性为目标,对青年科学基金项目的研究内容作进一步的延伸和拓展。通过设计包含底层语音产生模型与高层统计声学模型的层次化语音合成模型结构,实现语音学知识对声学参数预测的影响与控制;在单一发音动作参数基础上,研究共振峰、韵律模式等其他底层语音参数的建模与预测方法;基于层次化的语音合成模型,研究合成语音对情感、环境噪声影响等副语言与非语言学信息的表现方式。此研究课题在丰富语音信号建模方法、促进言语科学与言语工程结合、拓展语音合成系统应用领域等方面具有重要意义。
中文关键词: 语音合成;语音产生;声学模型;韵律建模;深度学习
英文摘要: Speech synthesis is a key technology in intelligent man-machine interaction. Flexible control on the characteristics of synthetic speech, such as speaker, timbre, and emotion, is an important developing direction of speech synthesis technology. In the Young Scholar NSFC project, for the first time, we introduced the articulatory features into statistical parametric speech synthesis. Based on the close relationship between articulatory features and speech production mechanism, we got effective control on the timbre of synthetic speech and the quality of specific vowels. Aiming at achieving controllable speech synthesis driven by phonetic knowledge, this project plans to extend the research work of the previous Young Scholar NSFC project. A hierarchical speech synthesis model, which contains a low-level speech production model and a high-level statistical acoustic model, is to be designed in order to control the generation of acoustic features by phonetic knowledge; besides articulatory features, we will research on other low-level speech representations, such as formants and prosodic patterns; based on the proposed hierarchical speech synthesis model, the method of conveying para-linguistic and non-linguistic information, such as emotions and environmental noise influence, in synthetic speech will be studied. The
英文关键词: speech synthesis;speech production;acoustic model;prosodic model;deep learning