We introduce VANI, a very lightweight multi-lingual accent controllable speech synthesis system. Our model builds upon disentanglement strategies proposed in RADMMM and supports explicit control of accent, language, speaker and fine-grained $F_0$ and energy features for speech synthesis. We utilize the Indic languages dataset, released for LIMMITS 2023 as part of ICASSP Signal Processing Grand Challenge, to synthesize speech in 3 different languages. Our model supports transferring the language of a speaker while retaining their voice and the native accent of the target language. We utilize the large-parameter RADMMM model for Track $1$ and lightweight VANI model for Track $2$ and $3$ of the competition.
翻译:我们引入了非常轻巧的多语种口音控制语音合成系统VANI,我们的模式以RADMMM提出的分解战略为基础,支持明确控制口音、语言、扬声器和精细加粒子,以及用于语言合成的能量特性;我们使用作为ICASSP信号处理大挑战的一部分为LIMIMITS 2023发布的印度语数据集,以三种不同语言合成语言的语音;我们的模式支持在保留目标语言的语音和本地口音的同时,转让发言者的语言;我们使用大参数RADMMMMM模型,用于100美元的轨道和轻重量VANI模型,用于竞赛的2美元和3美元轨道。</s>