使用半监督学习进行连续情感强度控制可控制语音合成 (Continuous Emotional Intensity Controllable Speech Synthesis using Semi-supervised Learning)

With the rapid development of the speech synthesis system, recent text-to-speech models have reached the level of generating natural speech similar to what humans say. But there still have limitations in terms of expressiveness. In particular, the existing emotional speech synthesis models have shown controllability using interpolated features with scaling parameters in emotional latent space. However, the emotional latent space generated from the existing models is difficult to control the continuous emotional intensity because of the entanglement of features like emotions, speakers, etc. In this paper, we propose a novel method to control the continuous intensity of emotions using semi-supervised learning. The model learns emotions of intermediate intensity using pseudo-labels generated from phoneme-level sequences of speech information. An embedding space built from the proposed model satisfies the uniform grid geometry with an emotional basis. In addition, to improve the naturalness of intermediate emotional speech, a discriminator is applied to the generation of low-level elements like duration, pitch and energy. The experimental results showed that the proposed method was superior in controllability and naturalness. The synthesized speech samples are available at https://tinyurl.com/34zaehh2

翻译：随着语音合成系统的迅速发展,最近的文本到语音模型已达到了产生与人所言相似的自然言语的水平。但是,在表达性方面仍然存在局限性。特别是,现有的情感语音合成模型已经表明,使用情感潜伏空间的缩放参数的内插性特征具有可控性;然而,由于情感、演讲者等特征的缠绕,现有模型产生的情感潜伏空间难以控制持续的情感强度。在本文件中,我们提出了一个新的方法,用半监督的学习来控制情绪的持续强度。模型用手机级语音信息序列生成的假标签来学习中间强度的情绪。从拟议模型中搭建的嵌入空间以情感为基础满足统一的电网几何学。此外,为了提高中间情绪语音的自然性,对诸如持续时间、音道和能量等低层次元素的生成应用了歧视。实验结果表明,拟议的方法在控制性和自然性方面优劣。综合语音样本可在 https://tinur2.commexhhh. https://tinur2。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日