平行塔克特龙2:非自动递减神经TTS模型,具有不同期限的建模 (Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling) - 专知论文

会员服务 ·

0

MoDELS · 语音合成 · SOFT · 词元分析器 · 注意力机制 ·

2021 年 8 月 29 日

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

翻译：平行塔克特龙2:非自动递减神经TTS模型,具有不同期限的建模

Isaac Elias,Heiga Zen,Jonathan Shen,Yu Zhang,Ye Jia,RJ Skerry-Ryan,Yonghui Wu

from arxiv, Submitted to INTERSPEECH 2021

This paper introduces Parallel Tacotron 2, a non-autoregressive neural text-to-speech model with a fully differentiable duration model which does not require supervised duration signals. The duration model is based on a novel attention mechanism and an iterative reconstruction loss based on Soft Dynamic Time Warping, this model can learn token-frame alignments as well as token durations automatically. Experimental results show that Parallel Tacotron 2 outperforms baselines in subjective naturalness in several diverse multi speaker evaluations. Its duration control capability is also demonstrated.

翻译：本文件介绍平行塔可罗2号,这是一个非航空神经文本到语音模型,具有完全不同的持续时间模型,不需要有监督的持续时间信号。持续时间模型基于一个新的关注机制和基于软动态时间扭曲的迭代重建损失,该模型可以自动学习代号-框架调整和象征性持续时间。实验结果表明,平行塔可罗2号在多个多发言者评价中主观自然性优于基线。其持续时间控制能力也得到了展示。

0

相关内容

MoDELS

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【ICML2021】压缩最大似然

专知会员服务

22+阅读 · 2021年9月23日

【ICML2021】对抗学习条件变分自编码器的端到端文本转语音

专知会员服务

10+阅读 · 2021年6月21日

【ICML2021】贝叶斯注意力信念网络

专知会员服务

39+阅读 · 2021年6月11日

【NeurIPS 2020】耶鲁大学等提出「AdaBelief」的新型优化器，速度快，训练稳，泛化强

专知会员服务

18+阅读 · 2020年10月19日

【ACL2020】端到端语音翻译的课程预训练

【ACL2020】端到端语音翻译的课程预训练

专知会员服务

6+阅读 · 2020年7月2日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

专知会员服务

37+阅读 · 2020年2月27日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

速度提升17.5倍！百度提出语音合成新模型，一个完全并行的神经TTS系统

速度提升17.5倍！百度提出语音合成新模型，一个完全并行的神经TTS系统

量子位

6+阅读 · 2019年5月29日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

翻译|同声传译被攻陷！谷歌发布Translatotron直接语音翻译系统

翻译|同声传译被攻陷！谷歌发布Translatotron直接语音翻译系统

机器人大讲堂

4+阅读 · 2019年5月17日

语音合成的里程碑：百度推出首个完全端到端的TTS模型

语音合成的里程碑：百度推出首个完全端到端的TTS模型

论智

7+阅读 · 2018年7月25日

Faster R-CNN

数据挖掘入门与实战

4+阅读 · 2018年4月20日

业界 | 带有韵律的合成语音：谷歌展示基于Tacotron的新型TTS方法

业界 | 带有韵律的合成语音：谷歌展示基于Tacotron的新型TTS方法

机器之心

3+阅读 · 2018年3月30日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

漫谈语音合成之Char2Wav模型

漫谈语音合成之Char2Wav模型

深度学习每日摘要

8+阅读 · 2017年12月31日

一种基于Sequence-to-Sequence的高质量对话生成方法

一种基于Sequence-to-Sequence的高质量对话生成方法

科技创新与创业

9+阅读 · 2017年11月13日

从2017年顶会论文看Attention Model

从2017年顶会论文看Attention Model

黑龙江大学自然语言处理实验室

5+阅读 · 2017年11月1日

Chunked Autoregressive GAN for Conditional Waveform Synthesis

Chunked Autoregressive GAN for Conditional Waveform Synthesis

Arxiv

0+阅读 · 2021年10月19日

ESPnet2-TTS: Extending the Edge of TTS Research

Arxiv

1+阅读 · 2021年10月15日

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Arxiv

3+阅读 · 2020年6月9日

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Arxiv

3+阅读 · 2020年2月2日

Teacher-Student Training for Robust Tacotron-based TTS

Teacher-Student Training for Robust Tacotron-based TTS

Arxiv

5+阅读 · 2019年11月7日

Attention Forcing for Sequence-to-sequence Model Training

Attention Forcing for Sequence-to-sequence Model Training

Arxiv

7+阅读 · 2019年9月26日

FastSpeech: Fast, Robust and Controllable Text to Speech

FastSpeech: Fast, Robust and Controllable Text to Speech

Arxiv

3+阅读 · 2019年5月22日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

词元分析器

注意力机制

相关VIP内容

【ICML2021】压缩最大似然

专知会员服务

22+阅读 · 2021年9月23日

【ICML2021】对抗学习条件变分自编码器的端到端文本转语音

专知会员服务

10+阅读 · 2021年6月21日

【ICML2021】贝叶斯注意力信念网络

专知会员服务

39+阅读 · 2021年6月11日

【NeurIPS 2020】耶鲁大学等提出「AdaBelief」的新型优化器，速度快，训练稳，泛化强

专知会员服务

18+阅读 · 2020年10月19日

【ACL2020】端到端语音翻译的课程预训练

【ACL2020】端到端语音翻译的课程预训练

专知会员服务

6+阅读 · 2020年7月2日

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

【伯克利】自回归模型的局部掩卷积，Locally Masked Convolution for Autoregressive Models

专知会员服务

20+阅读 · 2020年6月23日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

专知会员服务

37+阅读 · 2020年2月27日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

赋能真实世界：基于大语言模型的产业智能体技术、实践与评测综述

军事行动中人工智能系统目标交战的附带损伤评估模型 | 最新文献

【普林斯顿博士论文】面向人本机器人学的安全与学习博弈论融合

美陆军协会（AUSA）2025 年会公布的美国十大武器与防务产品创新

相关资讯

速度提升17.5倍！百度提出语音合成新模型，一个完全并行的神经TTS系统

速度提升17.5倍！百度提出语音合成新模型，一个完全并行的神经TTS系统

量子位

6+阅读 · 2019年5月29日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

翻译|同声传译被攻陷！谷歌发布Translatotron直接语音翻译系统

翻译|同声传译被攻陷！谷歌发布Translatotron直接语音翻译系统

机器人大讲堂

4+阅读 · 2019年5月17日

语音合成的里程碑：百度推出首个完全端到端的TTS模型

语音合成的里程碑：百度推出首个完全端到端的TTS模型

论智

7+阅读 · 2018年7月25日

Faster R-CNN

数据挖掘入门与实战

4+阅读 · 2018年4月20日

业界 | 带有韵律的合成语音：谷歌展示基于Tacotron的新型TTS方法

业界 | 带有韵律的合成语音：谷歌展示基于Tacotron的新型TTS方法

机器之心

3+阅读 · 2018年3月30日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

漫谈语音合成之Char2Wav模型

漫谈语音合成之Char2Wav模型

深度学习每日摘要

8+阅读 · 2017年12月31日

一种基于Sequence-to-Sequence的高质量对话生成方法

一种基于Sequence-to-Sequence的高质量对话生成方法

科技创新与创业

9+阅读 · 2017年11月13日

从2017年顶会论文看Attention Model

从2017年顶会论文看Attention Model

黑龙江大学自然语言处理实验室

5+阅读 · 2017年11月1日

相关论文

Chunked Autoregressive GAN for Conditional Waveform Synthesis

Chunked Autoregressive GAN for Conditional Waveform Synthesis

Arxiv

0+阅读 · 2021年10月19日

ESPnet2-TTS: Extending the Edge of TTS Research

Arxiv

1+阅读 · 2021年10月15日

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Arxiv

3+阅读 · 2020年6月9日

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

Arxiv

3+阅读 · 2020年2月2日

Teacher-Student Training for Robust Tacotron-based TTS

Teacher-Student Training for Robust Tacotron-based TTS

Arxiv

5+阅读 · 2019年11月7日

Attention Forcing for Sequence-to-sequence Model Training

Attention Forcing for Sequence-to-sequence Model Training

Arxiv

7+阅读 · 2019年9月26日

FastSpeech: Fast, Robust and Controllable Text to Speech

FastSpeech: Fast, Robust and Controllable Text to Speech

Arxiv

3+阅读 · 2019年5月22日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Close to Human Quality TTS with Transformer

Arxiv

3+阅读 · 2018年11月13日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员