UnivNet:高纤维波形生成多分辨率分光谱分光谱分谱神经导体 (UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation) - 专知论文

会员服务 ·

0

判别器 · INFORMS · MoDELS · FAST · 线性的 ·

2021 年 6 月 15 日

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

翻译：UnivNet:高纤维波形生成多分辨率分光谱分光谱分谱神经导体

Won Jang,Dan Lim,Jaesam Yoon,Bongwan Kim,Juntae Kim

from arxiv, Accepted to INTERSPEECH 2021

Most neural vocoders employ band-limited mel-spectrograms to generate waveforms. If full-band spectral features are used as the input, the vocoder can be provided with as much acoustic information as possible. However, in some models employing full-band mel-spectrograms, an over-smoothing problem occurs as part of which non-sharp spectrograms are generated. To address this problem, we propose UnivNet, a neural vocoder that synthesizes high-fidelity waveforms in real time. Inspired by works in the field of voice activity detection, we added a multi-resolution spectrogram discriminator that employs multiple linear spectrogram magnitudes computed using various parameter sets. Using full-band mel-spectrograms as input, we expect to generate high-resolution signals by adding a discriminator that employs spectrograms of multiple resolutions as the input. In an evaluation on a dataset containing information on hundreds of speakers, UnivNet obtained the best objective and subjective results among competing models for both seen and unseen speakers. These results, including the best subjective score for text-to-speech, demonstrate the potential for fast adaptation to new speakers without a need for training from scratch.

翻译：多数神经蒸气器使用带宽光谱谱仪生成波形。如果将全波频谱特征用作输入内容, vocoder可以提供尽可能多的声学信息。但是, 在使用全波光谱仪的某些模型中, 产生非正谱光谱的超移动问题。为了解决这个问题, 我们提议 UnivNet, 是一个实时合成高菲度波形的神经电解码器。在语音活动探测领域工作的启发下, 我们添加了一个多分辨率光谱分析器, 使用多种参数组计算多个线性光谱量的多分辨率光谱分析器。使用全波光谱仪作为输入, 我们期望通过添加一个使用多分辨率分光谱仪作为输入来生成高分辨率信号。在对包含数百个发言者的信息的数据集进行评估时, UnivitNet 获得了来自不同声音活动的相互竞争的模型中的最佳客观和主观结果。这些结果, 包括不需从可见的和看不见的演讲者进行快速扫描的文字调整, 包括用于快速的文字调整的最佳主观分数。

0

相关内容

判别器

【ICML2021】对抗学习条件变分自编码器的端到端文本转语音

专知会员服务

10+阅读 · 2021年6月21日

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

专知会员服务

37+阅读 · 2020年6月7日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

【MIT】条件说唱歌词生成与去噪自动编码器，Conditional Rap Lyrics Generation with Denoising Autoencoders

【MIT】条件说唱歌词生成与去噪自动编码器，Conditional Rap Lyrics Generation with Denoising Autoencoders

专知会员服务

16+阅读 · 2020年4月8日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation

Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation

Arxiv

0+阅读 · 2021年8月16日

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Arxiv

7+阅读 · 2021年6月11日

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Arxiv

7+阅读 · 2019年10月8日

Generative Adversarial Networks and Conditional Random Fields for Hyperspectral Image Classification

Arxiv

3+阅读 · 2019年5月12日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

Paraphrase Generation with Deep Reinforcement Learning

Paraphrase Generation with Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年8月23日

Speech waveform synthesis from MFCC sequences with generative adversarial networks

Arxiv

5+阅读 · 2018年4月3日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

VIP会员

文章信息

相关主题

相关VIP内容

【ICML2021】对抗学习条件变分自编码器的端到端文本转语音

专知会员服务

10+阅读 · 2021年6月21日

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

【WWW 2020 】基于关系对抗网络的低资源知识图谱补全，Relation Adversarial Network for Low Resource Knowledge Graph Completion

专知会员服务

37+阅读 · 2020年6月7日

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

【领域对抗学习的低资源文本分类】Low-Resource Text Classification using Domain-Adversarial Learning

专知会员服务

23+阅读 · 2020年4月22日

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

【微软-ACL2020】TinyMBERT: Multi-Stage Distillation Framework for Massive Multi-lingual NER

专知会员服务

36+阅读 · 2020年4月14日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

【MIT】条件说唱歌词生成与去噪自动编码器，Conditional Rap Lyrics Generation with Denoising Autoencoders

【MIT】条件说唱歌词生成与去噪自动编码器，Conditional Rap Lyrics Generation with Denoising Autoencoders

专知会员服务

16+阅读 · 2020年4月8日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

自动驾驶轨迹规划中的基础模型：进展综述与开放挑战

《用于提升多域战备的大型语言模型辅助场景生成器》报告

【斯坦福博士论文】为人类使用优化 AI 模型

国防领域人工智能规模化应用的理论与实践

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation

Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation

Arxiv

0+阅读 · 2021年8月16日

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Arxiv

7+阅读 · 2021年6月11日

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Arxiv

7+阅读 · 2019年10月8日

Generative Adversarial Networks and Conditional Random Fields for Hyperspectral Image Classification

Arxiv

3+阅读 · 2019年5月12日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

Paraphrase Generation with Deep Reinforcement Learning

Paraphrase Generation with Deep Reinforcement Learning

Arxiv

4+阅读 · 2018年8月23日

Speech waveform synthesis from MFCC sequences with generative adversarial networks

Arxiv

5+阅读 · 2018年4月3日

Generative Adversarial Networks and Probabilistic Graph Models for Hyperspectral Image Classification

Arxiv

11+阅读 · 2018年2月10日

SpectralNet: Spectral Clustering using Deep Neural Networks

Arxiv

11+阅读 · 2018年1月10日

微信扫码咨询专知VIP会员