IWSLT 2021年的USYD-JD语言翻译系统 (The USYD-JD Speech Translation System for IWSLT 2021) - 专知论文

会员服务 ·

0

语音翻译 · MoDELS · 语音识别 · BLEU · NMT ·

2021 年 7 月 24 日

The USYD-JD Speech Translation System for IWSLT 2021

翻译：IWSLT 2021年的USYD-JD语言翻译系统

Liang Ding,Di Wu,Dacheng Tao

from arxiv, IWSLT 2021 winning system of the low-resource speech translation track

This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task. We participated in the Swahili-English direction and got the best scareBLEU (25.3) score among all the participants. Our constrained system is based on a pipeline framework, i.e. ASR and NMT. We trained our models with the officially provided ASR and MT datasets. The ASR system is based on the open-sourced tool Kaldi and this work mainly explores how to make the most of the NMT models. To reduce the punctuation errors generated by the ASR model, we employ our previous work SlotRefine to train a punctuation correction model. To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning. For model structure, we tried auto-regressive and non-autoregressive models, respectively. In addition, we proposed two novel pre-train approaches, i.e. \textit{de-noising training} and \textit{bidirectional training} to fully exploit the data. Extensive experiments show that adding the above techniques consistently improves the BLEU scores, and the final submission system outperforms the baseline (Transformer ensemble model trained with the original parallel data) by approximately 10.8 BLEU score, achieving the SOTA performance.

翻译：本文介绍悉尼和JD大学联合提交的IWSLT 2021 低资源语音翻译任务。我们参与了斯瓦希里-英语方向,并在所有参与者中获得了最佳的 Swahili- Engli- Engli 分数(25.3)。我们的制约系统基于管道框架, 即 ASR 和 NMT 。我们用官方提供的 ASR 和 MT 数据集对模型进行了培训。 ASR 系统基于开放源码工具 Kaldi, 这项工作主要探索如何充分利用NMT 模型。为了减少 ASR 模型生成的平行错误, 我们使用了我们以前的工作 SlotRefine 来训练一个标度校正模型。为了实现更好的翻译性能, 我们探索了最新的有效战略, 包括背翻译、知识蒸馏、多功能重新排位和转动性调整。对于模型结构, 我们分别尝试了自动递增和非递增模式模型模型模型。此外,我们提出了两种新的前置方法, 即, 文本{ dede- develinReine 来训练一个校正校正校正的校验模型。

0

相关内容

语音翻译

通过计算机进行不同语言之间的直接语音翻译，辅助不同语言背景的人们进行沟通已经成为世界各国研究的重点。和一般的文本翻译不同，语音翻译需要把语音识别、机器翻译和语音合成三大技术进行集成，具有很大的挑战性。

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日

【AAAI 2020 |接收论文】使用屏蔽层次Transformer进行会话结构建模，Conversation Structure Modeling Using Masked Hierarchical Transformer，波士顿大学

【AAAI 2020 |接收论文】使用屏蔽层次Transformer进行会话结构建模，Conversation Structure Modeling Using Masked Hierarchical Transformer，波士顿大学

专知会员服务

5+阅读 · 2019年11月25日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

翻译|同声传译被攻陷！谷歌发布Translatotron直接语音翻译系统

翻译|同声传译被攻陷！谷歌发布Translatotron直接语音翻译系统

机器人大讲堂

4+阅读 · 2019年5月17日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Arxiv

0+阅读 · 2021年9月24日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

Arxiv

3+阅读 · 2018年9月11日

Phrase-Based & Neural Unsupervised Machine Translation

Phrase-Based & Neural Unsupervised Machine Translation

Arxiv

9+阅读 · 2018年8月13日

Bi-Directional Neural Machine Translation with Synthetic Parallel Data

Arxiv

6+阅读 · 2018年5月29日

Joint Training for Neural Machine Translation Models with Monolingual Data

Arxiv

4+阅读 · 2018年3月1日

Unsupervised Neural Machine Translation

Arxiv

6+阅读 · 2018年2月26日

Zero-Resource Neural Machine Translation with Multi-Agent Communication Game

Arxiv

4+阅读 · 2018年2月9日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

【综述】文献级机器翻译研究:方法与评价（A Survey on Document-level Machine Translation: Methods and Evaluation）

专知会员服务

7+阅读 · 2019年12月19日

【AAAI 2020 |接收论文】使用屏蔽层次Transformer进行会话结构建模，Conversation Structure Modeling Using Masked Hierarchical Transformer，波士顿大学

【AAAI 2020 |接收论文】使用屏蔽层次Transformer进行会话结构建模，Conversation Structure Modeling Using Masked Hierarchical Transformer，波士顿大学

专知会员服务

5+阅读 · 2019年11月25日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

翻译|同声传译被攻陷！谷歌发布Translatotron直接语音翻译系统

翻译|同声传译被攻陷！谷歌发布Translatotron直接语音翻译系统

机器人大讲堂

4+阅读 · 2019年5月17日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

【论文推荐】最新十篇机器翻译相关论文—自然语言推理、无监督神经机器翻译、多任务学习、局部卷积、图卷积、多语种机器翻译

专知

15+阅读 · 2018年5月1日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

推荐｜深度强化学习聊天机器人（附论文）！

推荐｜深度强化学习聊天机器人（附论文）！

全球人工智能

4+阅读 · 2018年1月30日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

自然语言处理（二）机器翻译篇 (NLP: machine translation)

自然语言处理（二）机器翻译篇 (NLP: machine translation)

DeepLearning中文论坛

12+阅读 · 2015年7月1日

相关论文

The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Arxiv

0+阅读 · 2021年9月24日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation

Arxiv

3+阅读 · 2018年9月11日

Phrase-Based & Neural Unsupervised Machine Translation

Phrase-Based & Neural Unsupervised Machine Translation

Arxiv

9+阅读 · 2018年8月13日

Bi-Directional Neural Machine Translation with Synthetic Parallel Data

Arxiv

6+阅读 · 2018年5月29日

Joint Training for Neural Machine Translation Models with Monolingual Data

Arxiv

4+阅读 · 2018年3月1日

Unsupervised Neural Machine Translation

Arxiv

6+阅读 · 2018年2月26日

Zero-Resource Neural Machine Translation with Multi-Agent Communication Game

Arxiv

4+阅读 · 2018年2月9日

Word Translation Without Parallel Data

Arxiv

7+阅读 · 2018年1月30日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员