HMM-无 HMM 编译器 (HMM-Free Encoder Pre-Training for Streaming RNN Transducer) - 专知论文

会员服务 ·

0

流 · 可约的 · RNN · MoDELS · Guidance ·

2021 年 6 月 11 日

HMM-Free Encoder Pre-Training for Streaming RNN Transducer

翻译：HMM-无 HMM 编译器

Lu Huang,Jingyu Sun,Yufeng Tang,Junfeng Hou,Jinkun Chen,Jun Zhang,Zejun Ma

from arxiv, Accepted by Interspeech 2021

This work describes an encoder pre-training procedure using frame-wise label to improve the training of streaming recurrent neural network transducer (RNN-T) model. Streaming RNN-T trained from scratch usually performs worse than non-streaming RNN-T. Although it is common to address this issue through pre-training components of RNN-T with other criteria or frame-wise alignment guidance, the alignment is not easily available in end-to-end manner. In this work, frame-wise alignment, used to pre-train streaming RNN-T's encoder, is generated without using a HMM-based system. Therefore an all-neural framework equipping HMM-free encoder pre-training is constructed. This is achieved by expanding the spikes of CTC model to their left/right blank frames, and two expanding strategies are proposed. To our best knowledge, this is the first work to simulate HMM-based frame-wise label using CTC model for pre-training. Experiments conducted on LibriSpeech and MLS English tasks show the proposed pre-training procedure, compared with random initialization, reduces the WER by relatively 5%~11% and the emission latency by 60 ms. Besides, the method is lexicon-free, so it is friendly to new languages without manually designed lexicon.

翻译：这项工作描述了一种使用框架标签的编码器培训前程序。使用框架标签改进流中神经网络导体( RNNN- T) 模式的培训。从零开始培训的 RNNN- T 的全新框架通常比非流中 RNNN- T 的预培训部分表现更差。虽然通过RNNT-T 的预培训部分和其他标准或框架-框架- 匹配指南来解决这一问题是常见的, 但根据我们的最佳知识, 这是使用气候技术预培训模式模拟基于 HMM 的框架标签的首份工作。在LibriSpeech 和 MLS- English 任务上进行的预培训实验没有使用基于 HMMM 的系统。因此, 安装HMM- 免费的编码器前培训的全新框架通常比非流中 RNNNNT- T 的预培训要差。这是通过将气候技术模型的峰值扩大至左/ 右空框和两个扩展战略来解决这个问题。根据我们的最佳知识, 这是使用气候模型模拟基于 HMMMMMMMMMMM的框架的首项模拟框架标签进行模拟的模拟培训。在LS- spee- sxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx,, 将使用新的Fxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

0

相关内容

【因果基础】Causality Basics，36页ppt

专知会员服务

52+阅读 · 2021年8月8日

[SIGIR2021] StackRec框架：加速训练100层序列推荐模型

专知会员服务

6+阅读 · 2021年6月27日

《图Transformer网络与语音识别》Facebook语音大牛Awni Hannun，附121页Slides与视频

专知会员服务

33+阅读 · 2021年6月26日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【上海交大-ICASSP2020】Transformer端到端的多说话人语音识别

【上海交大-ICASSP2020】Transformer端到端的多说话人语音识别

专知会员服务

51+阅读 · 2020年2月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【CAAI 2019】XLNet and Beyond，杨植麟，联合创始人，循环智能（Recurrent AI）

【CAAI 2019】XLNet and Beyond，杨植麟，联合创始人，循环智能（Recurrent AI）

专知会员服务

14+阅读 · 2019年12月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

使用RNN-Transducer进行语音识别建模【附PPT与视频资料】

使用RNN-Transducer进行语音识别建模【附PPT与视频资料】

人工智能前沿讲习班

74+阅读 · 2019年1月29日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

误差反向传播——RNN

误差反向传播——RNN

统计学习与视觉计算组

18+阅读 · 2018年9月6日

Simple Recurrent Unit For Sentence Classification

Simple Recurrent Unit For Sentence Classification

哈工大SCIR

6+阅读 · 2017年11月29日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

语音识别之--韩语语音识别

语音识别之--韩语语音识别

微信AI

16+阅读 · 2017年8月2日

HetEmotionNet: Two-Stream Heterogeneous Graph Recurrent Neural Network for Multi-modal Emotion Recognition

Arxiv

0+阅读 · 2021年8月7日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

Teacher-Student Training for Robust Tacotron-based TTS

Teacher-Student Training for Robust Tacotron-based TTS

Arxiv

5+阅读 · 2019年11月7日

Exploring RNN-Transducer for Chinese Speech Recognition

Arxiv

4+阅读 · 2019年4月23日

Rethinking ImageNet Pre-training

Arxiv

8+阅读 · 2018年11月21日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks

Arxiv

3+阅读 · 2018年6月6日

Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation

Arxiv

7+阅读 · 2018年5月25日

Joint Training for Neural Machine Translation Models with Monolingual Data

Arxiv

4+阅读 · 2018年3月1日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

相关VIP内容

【因果基础】Causality Basics，36页ppt

专知会员服务

52+阅读 · 2021年8月8日

[SIGIR2021] StackRec框架：加速训练100层序列推荐模型

专知会员服务

6+阅读 · 2021年6月27日

《图Transformer网络与语音识别》Facebook语音大牛Awni Hannun，附121页Slides与视频

专知会员服务

33+阅读 · 2021年6月26日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【上海交大-ICASSP2020】Transformer端到端的多说话人语音识别

【上海交大-ICASSP2020】Transformer端到端的多说话人语音识别

专知会员服务

51+阅读 · 2020年2月16日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【CAAI 2019】XLNet and Beyond，杨植麟，联合创始人，循环智能（Recurrent AI）

【CAAI 2019】XLNet and Beyond，杨植麟，联合创始人，循环智能（Recurrent AI）

专知会员服务

14+阅读 · 2019年12月4日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

美军“泰坦（TITAN）地面站目标系统”：是颠覆还是一场可预见的军事进步？

美空军指挥参谋学院 · 联合空中作战规划课程介绍（2025年） | 22页

一种基于视觉算法生成三维场景重建的多任务系统 | 2025最新200页

北约第十七届（2025年）网络冲突国际会议论文集 | 272页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

使用RNN-Transducer进行语音识别建模【附PPT与视频资料】

使用RNN-Transducer进行语音识别建模【附PPT与视频资料】

人工智能前沿讲习班

74+阅读 · 2019年1月29日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

误差反向传播——RNN

误差反向传播——RNN

统计学习与视觉计算组

18+阅读 · 2018年9月6日

Simple Recurrent Unit For Sentence Classification

Simple Recurrent Unit For Sentence Classification

哈工大SCIR

6+阅读 · 2017年11月29日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

语音识别之--韩语语音识别

语音识别之--韩语语音识别

微信AI

16+阅读 · 2017年8月2日

相关论文

HetEmotionNet: Two-Stream Heterogeneous Graph Recurrent Neural Network for Multi-modal Emotion Recognition

Arxiv

0+阅读 · 2021年8月7日

Curriculum Pre-training for End-to-End Speech Translation

Arxiv

4+阅读 · 2020年4月21日

Teacher-Student Training for Robust Tacotron-based TTS

Teacher-Student Training for Robust Tacotron-based TTS

Arxiv

5+阅读 · 2019年11月7日

Exploring RNN-Transducer for Chinese Speech Recognition

Arxiv

4+阅读 · 2019年4月23日

Rethinking ImageNet Pre-training

Arxiv

8+阅读 · 2018年11月21日

End-to-end Speech Recognition with Word-based RNN Language Models

End-to-end Speech Recognition with Word-based RNN Language Models

Arxiv

3+阅读 · 2018年8月8日

Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks

Arxiv

3+阅读 · 2018年6月6日

Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation

Arxiv

7+阅读 · 2018年5月25日

Joint Training for Neural Machine Translation Models with Monolingual Data

Arxiv

4+阅读 · 2018年3月1日

State-of-the-art Speech Recognition With Sequence-to-Sequence Models

Arxiv

7+阅读 · 2018年1月18日

微信扫码咨询专知VIP会员