个性化的轻量级文本转语音：自适应结构化剪枝下的声音克隆 (Personalized Lightweight Text-to-Speech: Voice Cloning with Adaptive Structured Pruning)

Personalized TTS is an exciting and highly desired application that allows users to train their TTS voice using only a few recordings. However, TTS training typically requires many hours of recording and a large model, making it unsuitable for deployment on mobile devices. To overcome this limitation, related works typically require fine-tuning a pre-trained TTS model to preserve its ability to generate high-quality audio samples while adapting to the target speaker's voice. This process is commonly referred to as ``voice cloning.'' Although related works have achieved significant success in changing the TTS model's voice, they are still required to fine-tune from a large pre-trained model, resulting in a significant size for the voice-cloned model. In this paper, we propose applying trainable structured pruning to voice cloning. By training the structured pruning masks with voice-cloning data, we can produce a unique pruned model for each target speaker. Our experiments demonstrate that using learnable structured pruning, we can compress the model size to 7 times smaller while achieving comparable voice-cloning performance.

翻译：个性化TTS是一项激动人心和高度需要的应用，它允许用户使用仅有几次录音来训练他们的TTS声音。然而，TTS训练通常需要数小时的录音和一个大模型，使其不适合在移动设备上部署。为了克服这个限制，相关工作通常需要微调预训练的TTS模型，以保持其生成高质量音频样本的能力，同时适应目标说话人的声音。这个过程通常被称为“声音克隆（voice cloning）”。尽管相关工作在转换TTS模型的声音方面取得了显著成功，但仍需要从一个大型预训练模型微调，从而导致声音克隆模型的大小相对较大。在本文中，我们提出了将可训练的结构化剪枝应用于声音克隆。通过使用声音克隆数据训练结构化剪枝掩码，我们可以为每个目标说话人产生惟一的裁剪模型。我们的实验表明，使用可学习的结构化剪枝，我们可以将模型大小压缩到原来的 7 倍，同时实现可比较的声音克隆性能。

相关内容

语音合成

关注 491

语音合成（Speech Synthesis），也称为文语转换（Text-to-Speech, TTS,它是将任意的输入文本转换成自然流畅的语音输出。语音合成涉及到人工智能、心理学、声学、语言学、数字信号处理、计算机科学等多个学科技术，是信息处理领域中的一项前沿技术。随着计算机技术的不断提高，语音合成技术从早期的共振峰合成,逐步发展为波形拼接合成和统计参数语音合成，再发展到混合语音合成；合成语音的质量、自然度已经得到明显提高，基本能满足一些特定场合的应用需求。目前，语音合成技术在银行、医院等的信息播报系统、汽车导航系统、自动应答呼叫中心等都有广泛应用，取得了巨大的经济效益。另外，随着智能手机、MP3、PDA 等与我们生活密切相关的媒介的大量涌现，语音合成的应用也在逐渐向娱乐、语音教学、康复治疗等领域深入。可以说语音合成正在影响着人们生活的方方面面。

用ChatGPT训练羊驼：「白泽」开源，轻松构建专属模型，可在线试玩

专知会员服务

69+阅读 · 2023年4月5日