Translatoron 2: 强力直接语音对语音翻译 (Translatotron 2: Robust direct speech-to-speech translation) - 专知论文

会员服务 ·

0

稳健性 · 有向 · MoDELS · 原点 · 音素 ·

2021 年 12 月 3 日

Translatotron 2: Robust direct speech-to-speech translation

翻译：Translatoron 2: 强力直接语音对语音翻译

Ye Jia,Michelle Tadmor Ramanovich,Tal Remez,Roi Pomerantz

We present Translatotron 2, a neural direct speech-to-speech translation model that can be trained end-to-end. Translatotron 2 consists of a speech encoder, a phoneme decoder, a mel-spectrogram synthesizer, and an attention module that connects all the previous three components. Experimental results suggest that Translatotron 2 outperforms the original Translatotron by a large margin in terms of translation quality and predicted speech naturalness, and drastically improves the robustness of the predicted speech by mitigating over-generation, such as babbling or long pause. We also propose a new method for retaining the source speaker's voice in the translated speech. The trained model is restricted to retain the source speaker's voice, but unlike the original Translatotron, it is not able to generate speech in a different speaker's voice, making the model more robust for production deployment, by mitigating potential misuse for creating spoofing audio artifacts. When the new method is used together with a simple concatenation-based data augmentation, the trained Translatotron 2 model is able to retain each speaker's voice for input with speaker turns.

翻译：我们展示 Translatoron 2, 一个神经直接语音对语音的翻译模型, 可以经过训练的终端到终端。 Translatoron 2 由语音编码器、电话解码器、元谱合成器和一个连接所有前三个组成部分的注意模块组成。实验结果表明, Translatoron 2 在翻译质量和预测的语音自然性方面有很大的比原Translatoron高, 并且通过减缓过度生成的音质和预测的语音自然性, 大大提高了预言的稳健性。我们还提出了一个在翻译的语音中保留源发言人声音的新方法。这个经过训练的模型仅限于保留源发言人的声音, 但与原来的 Translatoron不同, 它无法以不同的声音生成语音, 使该模型在生产部署上更加稳健, 从而减轻了为创造假音制品而滥用的可能性。当新方法与简单的基于配置的数据增强力一起使用时, 受过训练的 Translatoron 2 模型能够保留每个发言者的语音输入。

0

相关内容

稳健性

AAAI 2022：三角分解一致性约束的端到端语音翻译

AAAI 2022：三角分解一致性约束的端到端语音翻译

专知会员服务

9+阅读 · 2022年1月17日

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【ACL2020】端到端语音翻译的课程预训练

【ACL2020】端到端语音翻译的课程预训练

专知会员服务

6+阅读 · 2020年7月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

用于语音识别的数据增强

用于语音识别的数据增强

AI研习社

24+阅读 · 2019年6月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

翻译|同声传译被攻陷！谷歌发布Translatotron直接语音翻译系统

翻译|同声传译被攻陷！谷歌发布Translatotron直接语音翻译系统

机器人大讲堂

4+阅读 · 2019年5月17日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription

Arxiv

0+阅读 · 2022年2月4日

Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement

Arxiv

7+阅读 · 2021年12月21日

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Arxiv

3+阅读 · 2020年6月9日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

Arxiv

3+阅读 · 2020年3月4日

FastSpeech: Fast, Robust and Controllable Text to Speech

FastSpeech: Fast, Robust and Controllable Text to Speech

Arxiv

3+阅读 · 2019年5月22日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model

Arxiv

3+阅读 · 2018年6月12日

Conditional Image-to-Image Translation

Arxiv

8+阅读 · 2018年5月1日

Neural Machine Translation by Jointly Learning to Align and Translate

Arxiv

3+阅读 · 2016年5月19日

VIP会员

文章信息

相关主题

相关VIP内容

AAAI 2022：三角分解一致性约束的端到端语音翻译

AAAI 2022：三角分解一致性约束的端到端语音翻译

专知会员服务

9+阅读 · 2022年1月17日

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

【ACL2020】端到端语音翻译的课程预训练

【ACL2020】端到端语音翻译的课程预训练

专知会员服务

6+阅读 · 2020年7月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【NLP| 推荐文章】语言语音处理（Speech and Language Processing(3rd ed.draft)）

专知会员服务

15+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

ICCV最佳论文出炉，朱俊彦团队用砖块积木摘得桂冠

面向具身操作的高效视觉–语言–动作模型：系统综述

人类与人工智能战斗飞行员的交互研究

【NTU博士论文】反事实推理在多模态对话生成中的应用

相关资讯

【资源】语音增强资源集锦

【资源】语音增强资源集锦

专知

8+阅读 · 2020年7月4日

鲁棒机器学习相关文献集

鲁棒机器学习相关文献集

专知

8+阅读 · 2019年8月18日

用于语音识别的数据增强

用于语音识别的数据增强

AI研习社

24+阅读 · 2019年6月5日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

翻译|同声传译被攻陷！谷歌发布Translatotron直接语音翻译系统

翻译|同声传译被攻陷！谷歌发布Translatotron直接语音翻译系统

机器人大讲堂

4+阅读 · 2019年5月17日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

【论文推荐】最新5篇语音识别（ASR）相关论文—音频对抗样本、对抗性语音识别系统、声学模型、序列到序列、口语可理解性矫正

专知

14+阅读 · 2018年2月4日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

相关论文

MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription

Arxiv

0+阅读 · 2022年2月4日

Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement

Arxiv

7+阅读 · 2021年12月21日

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Arxiv

3+阅读 · 2020年6月9日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment

Arxiv

3+阅读 · 2020年3月4日

FastSpeech: Fast, Robust and Controllable Text to Speech

FastSpeech: Fast, Robust and Controllable Text to Speech

Arxiv

3+阅读 · 2019年5月22日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

Fusing Recency into Neural Machine Translation with an Inter-Sentence Gate Model

Arxiv

3+阅读 · 2018年6月12日

Conditional Image-to-Image Translation

Arxiv

8+阅读 · 2018年5月1日

Neural Machine Translation by Jointly Learning to Align and Translate

Arxiv

3+阅读 · 2016年5月19日

微信扫码咨询专知VIP会员