快速MD: 使用非自动递后隐藏介质快速多Decer 端对端语音翻译 (Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates) - 专知论文

会员服务 ·

0

解码 · 语音翻译 · 语音识别 · FAST · MoDELS ·

2021 年 9 月 27 日

Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates

翻译：快速MD: 使用非自动递后隐藏介质快速多Decer 端对端语音翻译

Hirofumi Inaguma,Siddharth Dalmia,Brian Yan,Shinji Watanabe

from arxiv, Accepted at IEEE ASRU 2021

The multi-decoder (MD) end-to-end speech translation model has demonstrated high translation quality by searching for better intermediate automatic speech recognition (ASR) decoder states as hidden intermediates (HI). It is a two-pass decoding model decomposing the overall task into ASR and machine translation sub-tasks. However, the decoding speed is not fast enough for real-world applications because it conducts beam search for both sub-tasks during inference. We propose Fast-MD, a fast MD model that generates HI by non-autoregressive (NAR) decoding based on connectionist temporal classification (CTC) outputs followed by an ASR decoder. We investigated two types of NAR HI: (1) parallel HI by using an autoregressive Transformer ASR decoder and (2) masked HI by using Mask-CTC, which combines CTC and the conditional masked language model. To reduce a mismatch in the ASR decoder between teacher-forcing during training and conditioning on CTC outputs during testing, we also propose sampling CTC outputs during training. Experimental evaluations on three corpora show that Fast-MD achieved about 2x and 4x faster decoding speed than that of the na\"ive MD model on GPU and CPU with comparable translation quality. Adopting the Conformer encoder and intermediate CTC loss further boosts its quality without sacrificing decoding speed.

翻译：多解码器(MD) 端对端语音翻译模式(MD) 通过寻找更好的中间自动语音识别(ASR) 解码器(HI) 的隐藏中间代号(ASR) 解码器(DHI), 显示了高翻译质量。这是一个双通解码模式, 将总体任务分解成 ASR 和机器翻译子任务。然而, 解码速度对于真实世界应用程序来说不够快, 因为它在推断过程中对两个子任务进行光束搜索。我们提议快速MD, 快速解码模型, 快速解码模型, 以非自动递增(NAR) 时间分类(CTC) 的中间解码生成 HI, 并随后为 ASR 解码器。我们调查了两种类型的NAR HI : (1) 通过使用自动递增变压变压器 ASR 解码器(ASR) 解码器和 (2) 掩码 HIHI, 因为它结合了CTC和蒙蔽语言模式。为了减少教师在测试期间对教师的分解和调时的中间解速度,我们提议在培训期间对 CROCEB公司质量进行抽样分析, 4的快速评估, 和加速分析。

0

相关内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

图像分割二十年，盘点影响力最大的10篇论文

专知会员服务

84+阅读 · 2020年9月27日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

【上海交大-ICASSP2020】Transformer端到端的多说话人语音识别

【上海交大-ICASSP2020】Transformer端到端的多说话人语音识别

专知会员服务

51+阅读 · 2020年2月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

【NeurIPS2019】高性能浅层RNN的类脑目标识别（Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs）

【NeurIPS2019】高性能浅层RNN的类脑目标识别（Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs）

专知会员服务

13+阅读 · 2019年12月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

3+阅读 · 2019年4月12日

Character Transformations for Non-Autoregressive GEC Tagging

Arxiv

0+阅读 · 2021年11月17日

End-to-End Dense Video Captioning with Parallel Decoding

Arxiv

0+阅读 · 2021年11月17日

High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency

Arxiv

0+阅读 · 2021年11月17日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Attention Forcing for Sequence-to-sequence Model Training

Attention Forcing for Sequence-to-sequence Model Training

Arxiv

7+阅读 · 2019年9月26日

FastSpeech: Fast, Robust and Controllable Text to Speech

FastSpeech: Fast, Robust and Controllable Text to Speech

Arxiv

3+阅读 · 2019年5月22日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation

Arxiv

7+阅读 · 2018年5月25日

VIP会员

文章信息

相关主题

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

图像分割二十年，盘点影响力最大的10篇论文

专知会员服务

84+阅读 · 2020年9月27日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【Google】无监督机器翻译，Unsupervised Machine Translation

【Google】无监督机器翻译，Unsupervised Machine Translation

专知会员服务

36+阅读 · 2020年3月3日

【上海交大-ICASSP2020】Transformer端到端的多说话人语音识别

【上海交大-ICASSP2020】Transformer端到端的多说话人语音识别

专知会员服务

51+阅读 · 2020年2月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

【ICLR2020】理解非自回归机器翻译中的知识蒸馏（Understanding Knowledge Distillation in Non-autoregressive Machine Translation）

专知会员服务

11+阅读 · 2019年12月28日

【NeurIPS2019】高性能浅层RNN的类脑目标识别（Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs）

【NeurIPS2019】高性能浅层RNN的类脑目标识别（Brain-Like Object Recognition with High-Performing Shallow Recurrent ANNs）

专知会员服务

13+阅读 · 2019年12月13日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

已删除

将门创投

3+阅读 · 2019年4月12日

相关论文

Character Transformations for Non-Autoregressive GEC Tagging

Arxiv

0+阅读 · 2021年11月17日

End-to-End Dense Video Captioning with Parallel Decoding

Arxiv

0+阅读 · 2021年11月17日

High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency

Arxiv

0+阅读 · 2021年11月17日

End-to-End Multi-speaker Speech Recognition with Transformer

Arxiv

8+阅读 · 2020年2月13日

Attention Forcing for Sequence-to-sequence Model Training

Attention Forcing for Sequence-to-sequence Model Training

Arxiv

7+阅读 · 2019年9月26日

FastSpeech: Fast, Robust and Controllable Text to Speech

FastSpeech: Fast, Robust and Controllable Text to Speech

Arxiv

3+阅读 · 2019年5月22日

Neural Speech Synthesis with Transformer Network

Neural Speech Synthesis with Transformer Network

Arxiv

5+阅读 · 2019年1月30日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

Arxiv

5+阅读 · 2018年6月4日

Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation

Arxiv

7+阅读 · 2018年5月25日

微信扫码咨询专知VIP会员