歌唱声音神经导体的等级传播模型 (Hierarchical Diffusion Models for Singing Voice Neural Vocoder)

Recent progress in deep generative models has improved the quality of neural vocoders in speech domain. However, it remains challenging to generate high-quality singing voice due to a wider variety of musical expressions in pitch, loudness, and pronunciations. In this work, we propose a hierarchical diffusion model for singing voice neural vocoders. The proposed method consists of multiple diffusion models operating in different sampling rates; the model at the lowest sampling rate focuses on generating accurate low frequency components such as pitch, and other models progressively generate the waveform at the higher sampling rates based on the data at the lower sampling rate and acoustic features. Experimental results show that the proposed method produces high-quality singing voice for multiple singers, outperforming state-of-the-art neural vocoders with a similar range of computational costs.

翻译：深层基因模型的最近进展提高了语音领域神经动脉变声器的质量,然而,由于音响、声响和音响的音乐表达形式多种多样,生成高质量的歌唱声音仍具有挑战性。在这项工作中,我们为歌声神经动脉变声器提出了一个等级分级的传播模式。拟议方法包括以不同采样率运作的多种传播模式;采用最低采样率的模型侧重于产生精确的低频率组件,如投影等,而其他模型则根据较低采样率和声学特征的数据,以较高采样率逐渐产生波变。实验结果显示,拟议方法为多个歌手产生高质量的歌唱声音,其性能优于艺术动脉变声器,其计算成本范围相似。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

宾夕法尼亚大学最新《不确定性估计》课程笔记，134页pdf，附Slides

专知会员服务

49+阅读 · 2022年11月13日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日