MixSpeech:与视频语音翻译和识别视听流混合的跨模式自学</s> (MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition) - 专知论文

会员服务 ·

0

语音翻译 · Mixup · 流 · INTERACT · state-of-the-art ·

2023 年 3 月 9 日

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition

翻译：MixSpeech:与视频语音翻译和识别视听流混合的跨模式自学

Xize Cheng,Linjun Li,Tao Jin,Rongjie Huang,Wang Lin,Zehan Wang,Huangdai Liu,Ye Wang,Aoxiong Yin,Zhou Zhao

from arxiv, https://github.com/Exgc/AVMuST-TED

Multi-media communications facilitate global interaction among people. However, despite researchers exploring cross-lingual translation techniques such as machine translation and audio speech translation to overcome language barriers, there is still a shortage of cross-lingual studies on visual speech. This lack of research is mainly due to the absence of datasets containing visual speech and translated text pairs. In this paper, we present \textbf{AVMuST-TED}, the first dataset for \textbf{A}udio-\textbf{V}isual \textbf{Mu}ltilingual \textbf{S}peech \textbf{T}ranslation, derived from \textbf{TED} talks. Nonetheless, visual speech is not as distinguishable as audio speech, making it difficult to develop a mapping from source speech phonemes to the target language text. To address this issue, we propose MixSpeech, a cross-modality self-learning framework that utilizes audio speech to regularize the training of visual speech tasks. To further minimize the cross-modality gap and its impact on knowledge transfer, we suggest adopting mixed speech, which is created by interpolating audio and visual streams, along with a curriculum learning strategy to adjust the mixing ratio as needed. MixSpeech enhances speech translation in noisy environments, improving BLEU scores for four languages on AVMuST-TED by +1.4 to +4.2. Moreover, it achieves state-of-the-art performance in lip reading on CMLR (11.1\%), LRS2 (25.5\%), and LRS3 (28.0\%).

翻译：然而,尽管研究人员探索了跨语言翻译技术,例如机器翻译和语音语音翻译,以克服语言障碍,但是仍然缺乏关于视觉语言的跨语言研究。这种研究的缺乏主要是由于缺少包含视觉语言和翻译文本配对的数据集。在本论文中,我们为\ textbf{{AUUST-TED}提供了第一组数据集,这是为\ textbf{A}A}udio-\ textb{V}}Textbf{MU}LTU}LU}LTUB{S}语音翻译等探索了跨语言翻译技术,但是从\ textbf{Tread}会谈中衍生出来的跨语言研究仍然很缺乏。然而,视觉语言并不象听得像音调一样,因此很难从源语言语音电话到目标语言文本进行绘图。为了解决这一问题,我们建议MixSpeetSpeech, 一个跨模式的自学框架,它利用音频语言训练来规范视觉语言任务。为了进一步减少跨模式阅读差距及其对视觉语言流流学的影响,我们建议采用混合语言流学的流学战略来调整。</s>

0

相关内容

语音翻译

通过计算机进行不同语言之间的直接语音翻译，辅助不同语言背景的人们进行沟通已经成为世界各国研究的重点。和一般的文本翻译不同，语音翻译需要把语音识别、机器翻译和语音合成三大技术进行集成，具有很大的挑战性。

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

专知会员服务

13+阅读 · 2022年3月19日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

用于微流体驱动的微纳米多级沟槽结构的飞秒激光制备技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

相容幂domain结构与函数逼近结构相关问题研究

国家自然科学基金

1+阅读 · 2015年12月31日

高效稳健的自适应绝对偏度滤波算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于GNSS/MEMS/WiFi/UWB优化组合的自适应室内外无缝定位

国家自然科学基金

0+阅读 · 2012年12月31日

(In,Ga)2Te3一维纳米结构及其核壳复合材料的可控制备与光电性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于电子束辐照技术提高LED发光效率的实验及机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

数据驱动的彩色图像颜色空间建模与去噪

国家自然科学基金

1+阅读 · 2012年12月31日

Bi基低维热电材料的结构设计与性能优化

国家自然科学基金

0+阅读 · 2011年12月31日

新型热电材料硅锗I型笼型化合物制备、结构与性能关系及热力学研究

国家自然科学基金

0+阅读 · 2011年12月31日

新型层状Bi-Co-O基氧化物材料的制备与热电性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

Reconstructing seen images from human brain activity via guided stochastic search

Arxiv

0+阅读 · 2023年4月30日

Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

Arxiv

0+阅读 · 2023年4月27日

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings

Arxiv

0+阅读 · 2023年4月27日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Recovering 3D Human Mesh from Monocular Images: A Survey

Arxiv

12+阅读 · 2022年3月8日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Learning from Very Few Samples: A Survey

Arxiv

126+阅读 · 2020年9月6日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

专知会员服务

13+阅读 · 2022年3月19日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

【IBM】在视觉和关系推理中迁移学习，Transfer Learning in Visual and Relational Reasoning

专知会员服务

45+阅读 · 2020年1月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用射频传感器载荷增强无人机的侦察、监视与目标获取（ISR）能力》报告

《导航战》2025最新报告

人工智能驱动的国防战术通信与网络：提升现代战争中的态势感知、安全性与自主决策 | 万字长文

《有人-无人轻型驱逐舰与中型无人水面艇支队在第二与第一岛链作战中的部署概念（CONOPS）》56页报告

相关资讯

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【推荐】图像分类必读开创性论文汇总

【推荐】图像分类必读开创性论文汇总

机器学习研究会

14+阅读 · 2017年8月15日

相关论文

Reconstructing seen images from human brain activity via guided stochastic search

Arxiv

0+阅读 · 2023年4月30日

Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization

Arxiv

0+阅读 · 2023年4月27日

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings

Arxiv

0+阅读 · 2023年4月27日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

Recovering 3D Human Mesh from Monocular Images: A Survey

Arxiv

12+阅读 · 2022年3月8日

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Adversarial Machine Learning in Image Classification: A Survey Towards the Defender's Perspective

Arxiv

17+阅读 · 2020年9月8日

Learning from Very Few Samples: A Survey

Arxiv

126+阅读 · 2020年9月6日

A Simple Framework for Contrastive Learning of Visual Representations

Arxiv

21+阅读 · 2020年2月13日

Knowledge Graph Transfer Network for Few-Shot Recognition

Arxiv

15+阅读 · 2019年11月21日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

相关基金

用于微流体驱动的微纳米多级沟槽结构的飞秒激光制备技术研究

国家自然科学基金

0+阅读 · 2015年12月31日

相容幂domain结构与函数逼近结构相关问题研究

国家自然科学基金

1+阅读 · 2015年12月31日

高效稳健的自适应绝对偏度滤波算法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于GNSS/MEMS/WiFi/UWB优化组合的自适应室内外无缝定位

国家自然科学基金

0+阅读 · 2012年12月31日

(In,Ga)2Te3一维纳米结构及其核壳复合材料的可控制备与光电性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于电子束辐照技术提高LED发光效率的实验及机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

数据驱动的彩色图像颜色空间建模与去噪

国家自然科学基金

1+阅读 · 2012年12月31日

Bi基低维热电材料的结构设计与性能优化

国家自然科学基金

0+阅读 · 2011年12月31日

新型热电材料硅锗I型笼型化合物制备、结构与性能关系及热力学研究

国家自然科学基金

0+阅读 · 2011年12月31日

新型层状Bi-Co-O基氧化物材料的制备与热电性能研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员