利用自我监督的训练前和数据增加前和数据增加,加强直接语音对语音翻译 (Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation) - 专知论文

会员服务 ·

0

数据增强 · 有向 · 离散化 · 语音合成 · 数据可用性 ·

2022 年 9 月 13 日

Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation

翻译：利用自我监督的训练前和数据增加前和数据增加,加强直接语音对语音翻译

Sravya Popuri,Peng-Jen Chen,Changhan Wang,Juan Pino,Yossi Adi,Jiatao Gu,Wei-Ning Hsu,Ann Lee

from arxiv, Accepted to be published in the Proceedings of Interspeech 2022

Direct speech-to-speech translation (S2ST) models suffer from data scarcity issues as there exists little parallel S2ST data, compared to the amount of data available for conventional cascaded systems that consist of automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS) synthesis. In this work, we explore self-supervised pre-training with unlabeled speech data and data augmentation to tackle this issue. We take advantage of a recently proposed speech-to-unit translation (S2UT) framework that encodes target speech into discrete representations, and transfer pre-training and efficient partial finetuning techniques that work well for speech-to-text translation (S2T) to the S2UT domain by studying both speech encoder and discrete unit decoder pre-training. Our experiments on Spanish-English translation show that self-supervised pre-training consistently improves model performance compared with multitask learning with an average 6.6-12.1 BLEU gain, and it can be further combined with data augmentation techniques that apply MT to create weakly supervised training data. Audio samples are available at: https://facebookresearch.github.io/speech_translation/enhanced_direct_s2st_units/index.html .

翻译：直接语音对语音翻译(S2ST)模型存在数据稀缺问题,因为与由自动语音识别(ASR)、机器翻译(MT)和文本对语音合成(TTS)组成的常规级联系统现有数据数量相比,S2ST数据几乎不存在平行数据,而传统级联系统的数据数量则包括自动语音识别(ASR)、机器翻译(MT)和文本对语音合成(TTS)等。在这项工作中,我们探索以无标签语音数据和数据扩增进行自我监督的预培训,以解决这一问题。我们利用最近提出的将目标语音对单位翻译(S2UT)编码成离散演示(S2UT)的框架,将培训前和高效的部分微调技术转移到S2UT域,这些技术通过对语音对文本翻译(S2T)进行自动识别、机器翻译(S2T)产生良好效果,同时研究语音编码编码和独立单位解析器分解码器。我们在西班牙语和英语翻译前的实验显示,与多任务学习相比,平均6.12.1 BLEU的收益,还可以进一步结合数据增强技术,并将数据增强技术应用MT用于创建薄弱的培训数据库。

0

相关内容

数据增强

数据增强在机器学习领域多指采用一些方法（比如数据蒸馏，正负样本均衡等）来提高模型数据集的质量，增强数据。

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

内质网应激IRE1－XBP1S通路在高糖引起肾脏及系膜细胞发生氧化应激及损伤中的机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

猪乳腺V-ATPase/mTORC1信号通路的基因表达及营养调控

国家自然科学基金

0+阅读 · 2014年12月31日

lincRNA-ETS1-2上调癌基因ETS-1表达促进雄激素非依赖性前列腺癌演进的机制

国家自然科学基金

0+阅读 · 2012年12月31日

旅游打包的协调策略及动态调度研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin对胰岛β细胞分泌胰岛素和增殖的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

抗肿瘤药物酪氨酸激酶抑制剂发生生物激活的机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

PI3Kα21450;mTOR双靶点抑制剂的设计合成及活性筛选

国家自然科学基金

0+阅读 · 2009年12月31日

基于杂交指示剂激发态性质构建新型DNA电化学传感器

国家自然科学基金

0+阅读 · 2009年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation

Arxiv

0+阅读 · 2022年10月24日

Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition

Arxiv

0+阅读 · 2022年10月24日

Guided contrastive self-supervised pre-training for automatic speech recognition

Arxiv

0+阅读 · 2022年10月22日

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

Arxiv

0+阅读 · 2022年10月21日

Joint Speech Translation and Named Entity Recognition

Arxiv

0+阅读 · 2022年10月21日

Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech

Arxiv

0+阅读 · 2022年10月21日

QA Domain Adaptation using Hidden Space Augmentation and Self-Supervised Contrastive Adaptation

Arxiv

0+阅读 · 2022年10月19日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

VIP会员

文章信息

相关主题

数据可用性

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

【NLP模型的跨语言/跨领域迁移】《Transferring NLP models across languages and domains》

专知会员服务

43+阅读 · 2019年11月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation

Arxiv

0+阅读 · 2022年10月24日

Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition

Arxiv

0+阅读 · 2022年10月24日

Guided contrastive self-supervised pre-training for automatic speech recognition

Arxiv

0+阅读 · 2022年10月22日

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

Arxiv

0+阅读 · 2022年10月21日

Joint Speech Translation and Named Entity Recognition

Arxiv

0+阅读 · 2022年10月21日

Evidence of Vocal Tract Articulation in Self-Supervised Learning of Speech

Arxiv

0+阅读 · 2022年10月21日

QA Domain Adaptation using Hidden Space Augmentation and Self-Supervised Contrastive Adaptation

Arxiv

0+阅读 · 2022年10月19日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Look-into-Object: Self-supervised Structure Modeling for Object Recognition

Arxiv

15+阅读 · 2020年3月31日

On Feature Normalization and Data Augmentation

On Feature Normalization and Data Augmentation

Arxiv

15+阅读 · 2020年2月25日

相关基金

内质网应激IRE1－XBP1S通路在高糖引起肾脏及系膜细胞发生氧化应激及损伤中的机制研究

国家自然科学基金

1+阅读 · 2014年12月31日

猪乳腺V-ATPase/mTORC1信号通路的基因表达及营养调控

国家自然科学基金

0+阅读 · 2014年12月31日

lincRNA-ETS1-2上调癌基因ETS-1表达促进雄激素非依赖性前列腺癌演进的机制

国家自然科学基金

0+阅读 · 2012年12月31日

旅游打包的协调策略及动态调度研究

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin对胰岛β细胞分泌胰岛素和增殖的影响及分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

抗肿瘤药物酪氨酸激酶抑制剂发生生物激活的机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

PI3Kα21450;mTOR双靶点抑制剂的设计合成及活性筛选

国家自然科学基金

0+阅读 · 2009年12月31日

基于杂交指示剂激发态性质构建新型DNA电化学传感器

国家自然科学基金

0+阅读 · 2009年12月31日

用dsDNA微阵列筛选NF-κDNA靶点及靶基因

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员