配有清晰发言代表的语音直接语音对语音翻译 (Textless Direct Speech-to-Speech Translation with Discrete Speech Representation) - 专知论文

会员服务 ·

0

离散化 · MoDELS · 有向 · 端到端 · 表示 ·

2022 年 10 月 31 日

Textless Direct Speech-to-Speech Translation with Discrete Speech Representation

翻译：配有清晰发言代表的语音直接语音对语音翻译

Xinjian Li,Ye Jia,Chung-Cheng Chiu

Research on speech-to-speech translation (S2ST) has progressed rapidly in recent years. Many end-to-end systems have been proposed and show advantages over conventional cascade systems, which are often composed of recognition, translation and synthesis sub-systems. However, most of the end-to-end systems still rely on intermediate textual supervision during training, which makes it infeasible to work for languages without written forms. In this work, we propose a novel model, Textless Translatotron, which is based on Translatotron 2, for training an end-to-end direct S2ST model without any textual supervision. Instead of jointly training with an auxiliary task predicting target phonemes as in Translatotron 2, the proposed model uses an auxiliary task predicting discrete speech representations which are obtained from learned or random speech quantizers. When a speech encoder pre-trained with unsupervised speech data is used for both models, the proposed model obtains translation quality nearly on-par with Translatotron 2 on the multilingual CVSS-C corpus as well as the bilingual Fisher Spanish-English corpus. On the latter, it outperforms the prior state-of-the-art textless model by +18.5 BLEU.

翻译：近年来,对语音到语音翻译(S2ST)的研究进展迅速,许多端到端系统已经提出,并显示出对常规级联系统的优势,传统级联系统通常由识别、翻译和合成子系统组成,但是,大多数端到端系统在培训期间仍然依赖中间文本监督,这使得无法为没有书面形式的语言工作。在这项工作中,我们提议了一种新型模型,即无文本的转写器,它以 Translatotron 2为基础,用于培训一个端到端直接的S2ST模型,而没有任何文字监督。拟议模式使用辅助任务,预测Translatoron 2的目标电话,而不是在Translatoron 2中联合培训。拟议模式使用辅助任务,预测从学习的或随机语音微调器获得的离散语音表达。当对两种模式使用未经校正的语音数据进行预先培训的演讲编码器时,拟议模式在多语种的CVSS-C-C-C-C-Transteron 2上获得近乎翻译质量的翻译质量质量,以及双语的西班牙-英语-英语模型,后由前者取代。

0

相关内容

离散化

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

高阶微分方程的周期解及多重性

国家自然科学基金

0+阅读 · 2015年12月31日

Trx对鸡心肌细胞能量代谢的影响

国家自然科学基金

0+阅读 · 2013年12月31日

Pictet–Spengler类反应机理的理论研究和新反应设计

国家自然科学基金

0+阅读 · 2013年12月31日

高分辨率角分辨光电子能谱对单原子层 FeSe 薄膜高临界温度超导体的电子结构研究

国家自然科学基金

0+阅读 · 2012年12月31日

牛源犬新孢子虫感染致孕鼠胎盘的损伤作用及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

ABO3/SrTiO3氧化物异质结电磁输运性质的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

航天器电磁编队飞行的动力学与控制特性研究

国家自然科学基金

1+阅读 · 2009年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

Zero-shot Triplet Extraction by Template Infilling

Arxiv

0+阅读 · 2022年12月21日

Task Ambiguity in Humans and Language Models

Arxiv

0+阅读 · 2022年12月20日

Generic Temporal Reasoning with Differential Analysis and Explanation

Generic Temporal Reasoning with Differential Analysis and Explanation

Arxiv

0+阅读 · 2022年12月20日

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

Arxiv

0+阅读 · 2022年12月19日

WACO: Word-Aligned Contrastive Learning for Speech Translation

Arxiv

0+阅读 · 2022年12月19日

AdaTranS: Adapting with Boundary-based Shrinking for End-to-End Speech Translation

Arxiv

0+阅读 · 2022年12月17日

BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric

Arxiv

0+阅读 · 2022年12月16日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

《科研智能：人工智能赋能工业仿真研究报告（2025年）》

具身智能中的世界模型：全面综述

【NeurIPS2025】迈向开放世界的三维“物体性”学习

【博士论文】用于排序与扩散模型的安全、高效与鲁棒强化学习

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Zero-shot Triplet Extraction by Template Infilling

Arxiv

0+阅读 · 2022年12月21日

Task Ambiguity in Humans and Language Models

Arxiv

0+阅读 · 2022年12月20日

Generic Temporal Reasoning with Differential Analysis and Explanation

Generic Temporal Reasoning with Differential Analysis and Explanation

Arxiv

0+阅读 · 2022年12月20日

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

SegAugment: Maximizing the Utility of Speech Translation Data with Segmentation-based Augmentations

Arxiv

0+阅读 · 2022年12月19日

WACO: Word-Aligned Contrastive Learning for Speech Translation

Arxiv

0+阅读 · 2022年12月19日

AdaTranS: Adapting with Boundary-based Shrinking for End-to-End Speech Translation

Arxiv

0+阅读 · 2022年12月17日

BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric

Arxiv

0+阅读 · 2022年12月16日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

相关基金

高阶微分方程的周期解及多重性

国家自然科学基金

0+阅读 · 2015年12月31日

Trx对鸡心肌细胞能量代谢的影响

国家自然科学基金

0+阅读 · 2013年12月31日

Pictet–Spengler类反应机理的理论研究和新反应设计

国家自然科学基金

0+阅读 · 2013年12月31日

高分辨率角分辨光电子能谱对单原子层 FeSe 薄膜高临界温度超导体的电子结构研究

国家自然科学基金

0+阅读 · 2012年12月31日

牛源犬新孢子虫感染致孕鼠胎盘的损伤作用及其机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

ABO3/SrTiO3氧化物异质结电磁输运性质的研究

国家自然科学基金

0+阅读 · 2012年12月31日

Witten Laplacian的特征值及与其相关的Ricci Soliton研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

航天器电磁编队飞行的动力学与控制特性研究

国家自然科学基金

1+阅读 · 2009年12月31日

GmMADS1在大豆花发育中的调控机理研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员