控制下语音语音综合分析:跨学科方法背后的理论理论 (The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach)

As part of the Human-Computer Interaction field, Expressive speech synthesis is a very rich domain as it requires knowledge in areas such as machine learning, signal processing, sociology, psychology. In this Chapter, we will focus mostly on the technical side. From the recording of expressive speech to its modeling, the reader will have an overview of the main paradigms used in this field, through some of the most prominent systems and methods. We explain how speech can be represented and encoded with audio features. We present a history of the main methods of Text-to-Speech synthesis: concatenative, parametric and statistical parametric speech synthesis. Finally, we focus on the last one, with the last techniques modeling Text-to-Speech synthesis as a sequence-to-sequence problem. This enables the use of Deep Learning blocks such as Convolutional and Recurrent Neural Networks as well as Attention Mechanism. The last part of the Chapter intends to assemble the different aspects of the theory and summarize the concepts.

翻译：作为人类-计算机互动领域的一部分,表达式语音合成是一个非常丰富的领域,因为它需要机器学习、信号处理、社会学、心理学等领域的知识。在本章中,我们将主要侧重于技术方面。从记录表达式发言到模型制作,读者将概述该领域使用的主要范式,通过一些最突出的系统和方法。我们解释如何代表语言,并用音频特征编码。我们展示了文本到语音合成的主要方法的历史:解析、参数和统计等分数语音合成。最后,我们侧重于最后一个方面,即将文字到语音合成作为最后一种技术模型,作为按顺序排列的问题。这样就能够使用深层学习块,如演进和常规神经网络以及注意机制。本章最后一部分打算汇集理论的不同方面并总结概念。

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

经典书《机器学习：概率视角》（Machine Learning: a Probabilistic Perspective）第二版Python代码，附1098页pdf下载

专知会员服务

277+阅读 · 2019年10月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日