FlowVocoder: 一个基于语言合成正常化流的小型脚印神经电解导导体 (FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis) - 专知论文

会员服务 ·

0

Performer · 语音合成 · MoDELS · 规范化的 · 累积分布函数 ·

2021 年 9 月 27 日

FlowVocoder: A small Footprint Neural Vocoder based Normalizing flow for Speech Synthesis

翻译：FlowVocoder: 一个基于语言合成正常化流的小型脚印神经电解导导体

Manh Luong,Viet Anh Tran

Recently, non-autoregressive neural vocoders have provided remarkable performance in generating high-fidelity speech and have been able to produce synthetic speech in real-time. However, non-autoregressive neural vocoders such as WaveGlow are far behind autoregressive neural vocoders like WaveFlow in terms of modeling audio signals due to their limitation in expressiveness. In addition, though NanoFlow is a state-of-the-art autoregressive neural vocoder that has immensely small parameters, its performance is marginally lower than WaveFlow. Therefore, in this paper, we propose a new type of autoregressive neural vocoder called FlowVocoder, which has a small memory footprint and is able to generate high-fidelity audio in real-time. Our proposed model improves the expressiveness of flow blocks by operating a mixture of Cumulative Distribution Function(CDF) for bipartite transformation. Hence, the proposed model is capable of modeling waveform signals as well as WaveFlow, while its memory footprint is much smaller thanWaveFlow. As shown in experiments, FlowVocoder achieves competitive results with baseline methods in terms of both subjective and objective evaluation, also, it is more suitable for real-time text-to-speech applications.

翻译：最近,非潜降神经蒸汽器在制作高度不忠言词方面表现显著,并且能够实时制作合成言辞。然而,WaveGlow等非潜化神经立体神经立体器在制作音频信号模型方面远远落后于WaveFlowFlow这样的自动反向神经立体立体器。此外,虽然NanoFlow是一个拥有极小参数的先进自动递增神经立体电动器,但其性能略低于WaveFlow。因此,我们在本文件中提出了一种新型的自动递增神经立体神经立体神经立体神经立体电动器,称为WlowGlowGlow,它具有小的记忆足迹,并且能够实时生成高偏移音频音频音频。我们提议的模型通过运行一种混合的累积分布函数(CDF)进行双向变形变形,提高了流流的清晰度。因此,拟议的模型能够模拟波形信号,也比WaveFlowlowlow略低一点。因此,我们提出了一种新型的自动神经神经神经立体神经立体神经,而其真实的记忆足迹在现实的实验中也比WaveFlowFloveFlove-dFlove-lat 都更具有竞争力。

0

相关内容

Performer

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

106+阅读 · 2021年10月30日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知会员服务

66+阅读 · 2020年6月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知

18+阅读 · 2020年6月22日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡一分钟】一种实用且高效的多视图匹配方法

【泡泡一分钟】一种实用且高效的多视图匹配方法

泡泡机器人SLAM

6+阅读 · 2018年11月19日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis

Arxiv

0+阅读 · 2021年11月19日

Word-Level Style Control for Expressive, Non-attentive Speech Synthesis

Arxiv

0+阅读 · 2021年11月19日

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

Arxiv

0+阅读 · 2021年11月19日

High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency

Arxiv

0+阅读 · 2021年11月17日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Arxiv

7+阅读 · 2019年10月8日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

LPCNet: Improving Neural Speech Synthesis Through Linear Prediction

Arxiv

3+阅读 · 2018年10月28日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

累积分布函数

相关VIP内容

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

【2021新书】并行高性能计算，705页pdf，Parallel and High Performance Computing

专知会员服务

106+阅读 · 2021年10月30日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知会员服务

66+阅读 · 2020年6月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【NeurIPS 2025】稳定电影度量：面向专业视频生成的结构化分类与评测体系

战场AI决策支持系统

【博士论文】面向排序与扩散模型的安全、高效与鲁棒强化学习

面向 AI 生成图像的安全与鲁棒水印：全面综述

相关资讯

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

【MIT】最优传输图神经网络，Optimal Transport Graph Neural Networks

专知

18+阅读 · 2020年6月22日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【泡泡一分钟】一种实用且高效的多视图匹配方法

【泡泡一分钟】一种实用且高效的多视图匹配方法

泡泡机器人SLAM

6+阅读 · 2018年11月19日

语音顶级会议Interspeech2018接受论文列表！

语音顶级会议Interspeech2018接受论文列表！

专知

6+阅读 · 2018年6月10日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis

Arxiv

0+阅读 · 2021年11月19日

Word-Level Style Control for Expressive, Non-attentive Speech Synthesis

Arxiv

0+阅读 · 2021年11月19日

DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

Arxiv

0+阅读 · 2021年11月19日

High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency

Arxiv

0+阅读 · 2021年11月17日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis

Arxiv

7+阅读 · 2019年10月8日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Neural source-filter-based waveform model for statistical parametric speech synthesis

Arxiv

4+阅读 · 2018年11月26日

LPCNet: Improving Neural Speech Synthesis Through Linear Prediction

Arxiv

3+阅读 · 2018年10月28日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员