通过半超式风格提取器和语音合成中等级建模改进跨语音风格传输的通用性</s> (Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis)

Cross-speaker style transfer in speech synthesis aims at transferring a style from source speaker to synthesized speech of a target speaker's timbre. In most previous methods, the synthesized fine-grained prosody features often represent the source speaker's average style, similar to the one-to-many problem(i.e., multiple prosody variations correspond to the same text). In response to this problem, a strength-controlled semi-supervised style extractor is proposed to disentangle the style from content and timbre, improving the representation and interpretability of the global style embedding, which can alleviate the one-to-many mapping and data imbalance problems in prosody prediction. A hierarchical prosody predictor is proposed to improve prosody modeling. We find that better style transfer can be achieved by using the source speaker's prosody features that are easily predicted. Additionally, a speaker-transfer-wise cycle consistency loss is proposed to assist the model in learning unseen style-timbre combinations during the training phase. Experimental results show that the method outperforms the baseline. We provide a website with audio samples.

翻译：在语音合成中,交叉口语风格传输的目的是将发源方发言者的风格转换为目标演讲人的音质综合表达式。在大多数以往的方法中,合成细微的感官功能往往代表源方发言者的平均风格,类似于一对多问题(即多种手动变异与同一文本类似) 。针对这一问题,建议使用一个强度控制下半监督式风格提取器,将风格与内容和音质脱钩,改进全球风格嵌入的表述和可解释性,这可以缓解一对多图象和数据不平衡问题。建议使用等级性动画预测器改进模拟。我们发现,使用源方发言者易于预测的动画性特征可以实现更好的风格转移。此外,还提议了一种发声器转移周期一致性损失,以帮助模型在培训阶段学习看不见的风格-丁字管组合。实验结果表明,该方法超出了基线。我们提供了一个有声频样本的网站。</s>

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

ChatGPT大模型全栈技术讲解！霍普金斯最新《NLP：自监督模型》2023课程全面讲解预训练指令学习和RLHF等技术，附讲义

专知会员服务

108+阅读 · 2023年4月8日

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日