它$$ text{o}$TTS 和它$ text{o}$Wave: 音频生成所需的所有你所需要的线性存储差异方程式 (It$\hat{\text{o}}$TTS and It$\hat{\text{o}}$Wave: Linear Stochastic Differential Equation Is All You Need For Audio Generation) - 专知论文

会员服务 ·

0

易处理的 · 线性的 · 语音合成 · SimPLe · Pair ·

2021 年 8 月 12 日

It$\hat{\text{o}}$TTS and It$\hat{\text{o}}$Wave: Linear Stochastic Differential Equation Is All You Need For Audio Generation

翻译：它$$ text{o}$TTS 和它$ text{o}$Wave: 音频生成所需的所有你所需要的线性存储差异方程式

Shoule Wu,Ziqiang Shi

from arxiv, The generated audio samples are available at https://shiziqiang.github.io/ito_audio/

In this paper, we propose to unify the two aspects of voice synthesis, namely text-to-speech (TTS) and vocoder, into one framework based on a pair of forward and reverse-time linear stochastic differential equations (SDE). The solutions of this SDE pair are two stochastic processes, one of which turns the distribution of mel spectrogram (or wave), that we want to generate, into a simple and tractable distribution. The other is the generation procedure that turns this tractable simple signal into the target mel spectrogram (or wave). The model that generates mel spectrogram is called It$\hat{\text{o}}$TTS, and the model that generates wave is called It$\hat{\text{o}}$Wave. It$\hat{\text{o}}$TTS and It$\hat{\text{o}}$Wave use the Wiener process as a driver to gradually subtract the excess signal from the noise signal to generate realistic corresponding meaningful mel spectrogram and audio respectively, under the conditional inputs of original text or mel spectrogram. The results of the experiment show that the mean opinion scores (MOS) of It$\hat{\text{o}}$TTS and It$\hat{\text{o}}$Wave can exceed the current state-of-the-art methods, and reached 3.925$\pm$0.160 and 4.35$\pm$0.115 respectively. The generated audio samples are available at https://shiziqiang.github.io/ito_audio/. All authors contribute equally to this work.

翻译：在本文中, 我们提议将语音合成的两个方面, 即文本到语音( TTS) 和vocoder 合并成一个框架, 以一对前方和反向线性线性分解方程为基础。这个 SDE 配对的解决方案是两个随机过程, 其中之一是将我们想要生成的光谱( 或波) 的分布转换成一个简单和可移动的分布。另一个是将这个可移动的简单信号转换成目标Mel光谱( 或波) 的生成程序。生成Mel光谱的模型叫做 I$\ hat\ text{ o_ $TTS, 而生成波的模型叫 It$\ hat text{ o_ text{ $} o $ wave。其中之一是我们想要生成的光谱( 或波或波或波 ) 将 Wiener 进程作为驱动器, 逐渐减少这个声音信号的多余信号, 以产生现实的对应的线性线性光谱/ 和音频谱。在原始的输入值$ 4. 或Mexlialtial_ lio=x_ salma_ slex

0

相关内容

易处理的

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

57+阅读 · 2020年11月21日

【InterSpeech2020】最新《高效与灵活的机器学习实现》教程，177页ppt

【InterSpeech2020】最新《高效与灵活的机器学习实现》教程，177页ppt

专知会员服务

27+阅读 · 2020年11月10日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

专知会员服务

11+阅读 · 2020年4月27日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

论文共读 | Attention is All You Need

论文共读 | Attention is All You Need

黑龙江大学自然语言处理实验室

14+阅读 · 2017年9月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Arxiv

0+阅读 · 2021年10月11日

Neural HMMs are all you need (for high-quality attention-free TTS)

Arxiv

0+阅读 · 2021年10月9日

A Stochastic Newton Algorithm for Distributed Convex Optimization

A Stochastic Newton Algorithm for Distributed Convex Optimization

Arxiv

1+阅读 · 2021年10月7日

GANtron: Emotional Speech Synthesis with Generative Adversarial Networks

Arxiv

0+阅读 · 2021年10月6日

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Arxiv

7+阅读 · 2021年6月11日

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Arxiv

7+阅读 · 2019年10月8日

Question Generation by Transformers

Question Generation by Transformers

Arxiv

5+阅读 · 2019年9月14日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

Generating Thematic Chinese Poetry using Conditional Variational Autoencoders with Hybrid Decoders

Arxiv

8+阅读 · 2018年1月30日

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Arxiv

18+阅读 · 2018年1月5日

VIP会员

文章信息

相关主题

相关VIP内容

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

不可错过！华盛顿大学最新《生成式模型》课程，附PPT

专知会员服务

65+阅读 · 2020年12月11日

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

【经典书】应用随机微分方程，324页pdf，Applied Stochastic Differential Equations

专知会员服务

57+阅读 · 2020年11月21日

【InterSpeech2020】最新《高效与灵活的机器学习实现》教程，177页ppt

【InterSpeech2020】最新《高效与灵活的机器学习实现》教程，177页ppt

专知会员服务

27+阅读 · 2020年11月10日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

【哈佛-ICLR2020】基于残差能量模型的文本生成，Residual Energy-Based Models for Text Generation

专知会员服务

11+阅读 · 2020年4月27日

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

微软发布DialoGPT预训练语言模型，论文与代码 Large-Scale Generative Pre-training for Conversational Response Generation

专知会员服务

28+阅读 · 2019年11月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

LibRec 精选：AutoML for Contextual Bandits

LibRec 精选：AutoML for Contextual Bandits

LibRec智能推荐

7+阅读 · 2019年9月19日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

论文共读 | Attention is All You Need

论文共读 | Attention is All You Need

黑龙江大学自然语言处理实验室

14+阅读 · 2017年9月7日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Arxiv

0+阅读 · 2021年10月11日

Neural HMMs are all you need (for high-quality attention-free TTS)

Arxiv

0+阅读 · 2021年10月9日

A Stochastic Newton Algorithm for Distributed Convex Optimization

A Stochastic Newton Algorithm for Distributed Convex Optimization

Arxiv

1+阅读 · 2021年10月7日

GANtron: Emotional Speech Synthesis with Generative Adversarial Networks

Arxiv

0+阅读 · 2021年10月6日

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Arxiv

7+阅读 · 2021年6月11日

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Arxiv

7+阅读 · 2019年10月8日

Question Generation by Transformers

Question Generation by Transformers

Arxiv

5+阅读 · 2019年9月14日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

Generating Thematic Chinese Poetry using Conditional Variational Autoencoders with Hybrid Decoders

Arxiv

8+阅读 · 2018年1月30日

Audio Adversarial Examples: Targeted Attacks on Speech-to-Text

Arxiv

18+阅读 · 2018年1月5日

微信扫码咨询专知VIP会员