JETS: 联合培训快速语音2和HFi-GAN,以结束发言的结束文字 (JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech) - 专知论文

会员服务 ·

0

FastSpeech2 · Learning · MoDELS · 语音合成 · 级联 ·

2022 年 7 月 1 日

JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech

翻译：JETS: 联合培训快速语音2和HFi-GAN,以结束发言的结束文字

Dan Lim,Sunghee Jung,Eesung Kim

from arxiv, Accepted to INTERSPEECH 2022

In neural text-to-speech (TTS), two-stage system or a cascade of separately learned models have shown synthesis quality close to human speech. For example, FastSpeech2 transforms an input text to a mel-spectrogram and then HiFi-GAN generates a raw waveform from a mel-spectogram where they are called an acoustic feature generator and a neural vocoder respectively. However, their training pipeline is somewhat cumbersome in that it requires a fine-tuning and an accurate speech-text alignment for optimal performance. In this work, we present end-to-end text-to-speech (E2E-TTS) model which has a simplified training pipeline and outperforms a cascade of separately learned models. Specifically, our proposed model is jointly trained FastSpeech2 and HiFi-GAN with an alignment module. Since there is no acoustic feature mismatch between training and inference, it does not requires fine-tuning. Furthermore, we remove dependency on an external speech-text alignment tool by adopting an alignment learning objective in our joint training framework. Experiments on LJSpeech corpus shows that the proposed model outperforms publicly available, state-of-the-art implementations of ESPNet2-TTS on subjective evaluation (MOS) and some objective evaluations.

翻译：在神经文本到声音(TTS)中,两阶段系统或一系列单独学习的模式显示的合成质量接近人类语言。例如,FastSpeech2将输入文本转换成Mel-spectrogrogram,然后HiFi-GAN从一个光谱中产生一个原始波形,分别称为声学特征生成器和神经蒸汽器。然而,他们的培训管道有些繁琐,因为它需要微调和准确的语音文本对最佳性能进行精确的校正。在这项工作中,我们提出了终端到终端文本到语音(E2-E2E-TTS)模型,该模型有一个简化的培训管道,并超越了单独学习模型的系列。具体地说,我们拟议的模型是联合培训快速Speech2 和 HiFi-GAN,使用一个校正模块进行联合培训。由于在培训和推断之间没有声调功能不匹配,因此不需要微调。此外,我们不再依赖外部语音文本调整工具,方法是在联合培训框架中采用一个校正学习目标,从而对公众进行测试。

0

相关内容

FastSpeech2

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

东中国海气溶胶光学性质及气溶胶模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

Diversin介导非小细胞肺癌长春瑞滨耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

花粉母细胞高表达转录因子在花粉发育中功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向工业现场大尺寸测量的空气折射率廓线多特征吸收光谱动态反演方法

国家自然科学基金

0+阅读 · 2012年12月31日

血清miR-696在肌肉-肝脏对话调节代谢的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时无隙钢球精密传动原理及热力耦合动力学性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

经血干细胞携带基因工程化的溶瘤腺病毒靶向治疗大肠癌的效能及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

多波长拉曼激光雷达探测银川地区气溶胶及水汽的关键技术及实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

Clusterin通过线粒体凋亡通路调节肝细胞肝癌化疗耐受机理的研究

国家自然科学基金

0+阅读 · 2011年12月31日

我国医院绩效评价方法与实证研究

国家自然科学基金

0+阅读 · 2011年12月31日

Error Correction in ASR using Sequence-to-Sequence Models

Arxiv

0+阅读 · 2022年8月23日

GIT: A Generative Image-to-text Transformer for Vision and Language

Arxiv

1+阅读 · 2022年8月22日

Learning Speaker-specific Lip-to-Speech Generation

Arxiv

0+阅读 · 2022年8月20日

General-to-Specific Transfer Labeling for Domain Adaptable Keyphrase Generation

Arxiv

0+阅读 · 2022年8月20日

Ensemble uncertainty as a criterion for dataset expansion in distinct bone segmentation from upper-body CT images

Arxiv

0+阅读 · 2022年8月19日

Multi-Object Tracking with Deep Learning Ensemble for Unmanned Aerial System Applications

Arxiv

26+阅读 · 2021年10月5日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

How to train your MAML

Arxiv

26+阅读 · 2019年3月5日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

VIP会员

文章信息

相关主题

相关VIP内容

最新《自监督表示学习》报告，70页ppt

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《毁灭算法：解析以色列在加沙的AI军事行动》

【COLT 2025最新教程】语言生成

以机器速度锁定目标：人工智能的能力与局限

【ICML2025】通过在线世界模型规划的持续强化学习

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

19+阅读 · 2017年12月17日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

相关论文

Error Correction in ASR using Sequence-to-Sequence Models

Arxiv

0+阅读 · 2022年8月23日

GIT: A Generative Image-to-text Transformer for Vision and Language

Arxiv

1+阅读 · 2022年8月22日

Learning Speaker-specific Lip-to-Speech Generation

Arxiv

0+阅读 · 2022年8月20日

General-to-Specific Transfer Labeling for Domain Adaptable Keyphrase Generation

Arxiv

0+阅读 · 2022年8月20日

Ensemble uncertainty as a criterion for dataset expansion in distinct bone segmentation from upper-body CT images

Arxiv

0+阅读 · 2022年8月19日

Multi-Object Tracking with Deep Learning Ensemble for Unmanned Aerial System Applications

Arxiv

26+阅读 · 2021年10月5日

Unsupervised Domain Clusters in Pretrained Language Models

Arxiv

11+阅读 · 2020年4月5日

Meta-Learning to Cluster

Meta-Learning to Cluster

Arxiv

17+阅读 · 2019年10月30日

How to train your MAML

Arxiv

26+阅读 · 2019年3月5日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

相关基金

东中国海气溶胶光学性质及气溶胶模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

Diversin介导非小细胞肺癌长春瑞滨耐药的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

花粉母细胞高表达转录因子在花粉发育中功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向工业现场大尺寸测量的空气折射率廓线多特征吸收光谱动态反演方法

国家自然科学基金

0+阅读 · 2012年12月31日

血清miR-696在肌肉-肝脏对话调节代谢的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

实时无隙钢球精密传动原理及热力耦合动力学性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

经血干细胞携带基因工程化的溶瘤腺病毒靶向治疗大肠癌的效能及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

多波长拉曼激光雷达探测银川地区气溶胶及水汽的关键技术及实验研究

国家自然科学基金

0+阅读 · 2011年12月31日

Clusterin通过线粒体凋亡通路调节肝细胞肝癌化疗耐受机理的研究

国家自然科学基金

0+阅读 · 2011年12月31日

我国医院绩效评价方法与实证研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员