融合语音产生机理与统计声学建模的层次化语音合成方法研究

项目名称： 融合语音产生机理与统计声学建模的层次化语音合成方法研究

项目编号： No.61273032

项目类型： 面上项目

立项/批准年度： 2013

项目学科： 自动化技术、计算机技术

项目作者： 凌震华

作者单位： 中国科学技术大学

项目金额： 80万元

中文摘要： 语音合成是智能人机交互领域的一项关键技术，对合成语音所体现话者、音色、情感等特性的灵活控制是语音合成技术的一个重要发展方向。在青年科学基金项目中，我们首次将发音动作参数引入到统计参数语音合成中，利用发音动作参数与语音产生机理的直接相关性，取得了对合成语音音色与元音发音方式的有效控制。本项目旨在以实现语音学知识驱动下语音合成的高可控性为目标，对青年科学基金项目的研究内容作进一步的延伸和拓展。通过设计包含底层语音产生模型与高层统计声学模型的层次化语音合成模型结构，实现语音学知识对声学参数预测的影响与控制；在单一发音动作参数基础上，研究共振峰、韵律模式等其他底层语音参数的建模与预测方法；基于层次化的语音合成模型，研究合成语音对情感、环境噪声影响等副语言与非语言学信息的表现方式。此研究课题在丰富语音信号建模方法、促进言语科学与言语工程结合、拓展语音合成系统应用领域等方面具有重要意义。

中文关键词： 语音合成；语音产生；声学模型；韵律建模；深度学习

英文摘要： Speech synthesis is a key technology in intelligent man-machine interaction. Flexible control on the characteristics of synthetic speech, such as speaker, timbre, and emotion, is an important developing direction of speech synthesis technology. In the Young Scholar NSFC project, for the first time, we introduced the articulatory features into statistical parametric speech synthesis. Based on the close relationship between articulatory features and speech production mechanism, we got effective control on the timbre of synthetic speech and the quality of specific vowels. Aiming at achieving controllable speech synthesis driven by phonetic knowledge, this project plans to extend the research work of the previous Young Scholar NSFC project. A hierarchical speech synthesis model, which contains a low-level speech production model and a high-level statistical acoustic model, is to be designed in order to control the generation of acoustic features by phonetic knowledge; besides articulatory features, we will research on other low-level speech representations, such as formants and prosodic patterns; based on the proposed hierarchical speech synthesis model, the method of conveying para-linguistic and non-linguistic information, such as emotions and environmental noise influence, in synthetic speech will be studied. The

英文关键词： speech synthesis；speech production；acoustic model；prosodic model；deep learning

成为VIP会员查看完整内容

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

多语言语音识别声学模型建模方法最新进展

专知会员服务

36+阅读 · 2022年2月7日

面向任务型的对话系统研究进展

专知会员服务

59+阅读 · 2021年11月17日

基于深度学习的语音合成与转换技术综述

专知会员服务

31+阅读 · 2021年8月16日

基于规则的建模方法的可解释性及其发展

专知会员服务

104+阅读 · 2021年6月23日