MelGAN:有条件波形合成生成反向网络 (MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis)

Previous works \citep{donahue2018adversarial, engel2019gansynth} have found that generating coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by introducing a set of architectural changes and simple training techniques. Subjective evaluation metric (Mean Opinion Score, or MOS) shows the effectiveness of the proposed approach for high quality mel-spectrogram inversion. To establish the generality of the proposed techniques, we show qualitative results of our model in speech synthesis, music domain translation and unconditional music synthesis. We evaluate the various components of the model through ablation studies and suggest a set of guidelines to design general purpose discriminators and generators for conditional sequence synthesis tasks. Our model is non-autoregressive, fully convolutional, with significantly fewer parameters than competing models and generalizes to unseen speakers for mel-spectrogram inversion. Our pytorch implementation runs at more than 100x faster than realtime on GTX 1080Ti GPU and more than 2x faster than real-time on CPU, without any hardware specific optimization tricks. Blog post with samples and accompanying code coming soon.

翻译：之前的作品 \ citep{donahue2018 对抗性, engel2019gansynth} 发现, 与 GANs 生成一致的原始声波成形是具有挑战性的。在本文中, 我们显示, 通过引入一系列建筑变化和简单培训技术, 可靠地培训 GANs 生成高质量一致的波形。主观评价度量( MEan Visional C分数, 或 MOS) 显示高质量Mel- pectrocrogro 反转的拟议方法的有效性。为了确定拟议技术的普遍性, 我们展示了在语音合成、音乐域翻译和无条件的音乐合成方面模型的质量效果。我们通过减缩研究来评估模型的各个组成部分, 并为有条件的序列合成任务提出一套设计通用导体和生成器的指导方针。我们的模型是非引力性、完全革命性的, 其参数比相竞相的模型要少得多, 以及普通的演讲者对Mel- pectrographrogrogram 。我们在GTX 1080TGPPPPPU 上比实时要快100x 。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【CVPR2020-清华大学】渐进对抗网络的细粒度域适应，Progressive Adversarial Networks

专知会员服务

27+阅读 · 2020年4月4日