将无拉ttice-无拉ttice MMI 纳入终端至终端语音识别 (Integrating Lattice-Free MMI into End-to-End Speech Recognition) - 专知论文

会员服务 ·

0

E2E · 语音识别 · Performer · 判别器 · 解码 ·

2022 年 8 月 23 日

Integrating Lattice-Free MMI into End-to-End Speech Recognition

翻译：将无拉ttice-无拉ttice MMI 纳入终端至终端语音识别

Jinchuan Tian,Jianwei Yu,Chao Weng,Yuexian Zou,Dong Yu

from arxiv, in IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022

In automatic speech recognition (ASR) research, discriminative criteria have achieved superior performance in DNN-HMM systems. Given this success, the adoption of discriminative criteria is promising to boost the performance of end-to-end (E2E) ASR systems. With this motivation, previous works have introduced the minimum Bayesian risk (MBR, one of the discriminative criteria) into E2E ASR systems. However, the effectiveness and efficiency of the MBR-based methods are compromised: the MBR criterion is only used in system training, which creates a mismatch between training and decoding; the on-the-fly decoding process in MBR-based methods results in the need for pre-trained models and slow training speeds. To this end, novel algorithms are proposed in this work to integrate another widely used discriminative criterion, lattice-free maximum mutual information (LF-MMI), into E2E ASR systems not only in the training stage but also in the decoding process. The proposed LF-MMI training and decoding methods show their effectiveness on two widely used E2E frameworks: Attention-Based Encoder-Decoders (AEDs) and Neural Transducers (NTs). Compared with MBR-based methods, the proposed LF-MMI method: maintains the consistency between training and decoding; eschews the on-the-fly decoding process; trains from randomly initialized models with superior training efficiency. Experiments suggest that the LF-MMI method outperforms its MBR counterparts and consistently leads to statistically significant performance improvements on various frameworks and datasets from 30 hours to 14.3k hours. The proposed method achieves state-of-the-art (SOTA) results on Aishell-1 (CER 4.10%) and Aishell-2 (CER 5.02%) datasets. Code is released.

翻译：在自动语音识别(ASR)研究中,歧视性标准在DNN-HMM系统中取得了优异的性能。鉴于这一成功,采用歧视性标准有望提高端对端(E2E)ASR系统的性能。有了这一动机,以前的工作已经将巴伊西亚最低风险(MBR, 歧视标准之一)引入了E2E ASR系统。然而,基于MBR方法的效力和效率受到影响:MBR标准仅用于系统培训,这造成了培训与解码之间的不匹配;基于MBR方法的在线解码进程导致需要预先培训模型,培训速度缓慢。为此,在这项工作中提出了新的算法,将另一个广泛使用的歧视性标准(MBRBR(L-MI, 一种歧视标准标准)引入E2E系统(LM-BRI),不仅在培训阶段,而且在解码过程中使用。拟议的LF-M-BI培训和解码方法显示其在两个广泛使用的 E2E-EFA框架上的有效性:SO-C-deal-Deal-Deal-Deal Aral-Trading Aration Aration Ax-de-LM-deal-de-de dal Ag-de drodustrism-de d dal thes thes thes the contrad the dal dislational the dislational the contra dald the dald the daldal the dism-d the dal the daldaldald thes thes)

0

相关内容

E2E

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

野油菜黄单胞菌群体感应信号DSF生物合成途径和机理研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于组分分离的生物质高值化转化研究

国家自然科学基金

0+阅读 · 2014年12月31日

可再生能源电价机制形成机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

稀土元素对FeGa合金性能影响机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

Degasperis-Procesi方程若干控制问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于谐波效应补偿的双三相永磁同步电动机两电机串联系统解耦控制方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Vensim技术的医疗机构病人安全文化评价与实证研究

国家自然科学基金

0+阅读 · 2012年12月31日

河口区DOM生物可利用性的时空演变及其调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

脑动静脉畸形易感基因MMP-3启动子SNP对转录调控影响的机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Dicer和Drosha基因遗传变异与膀胱癌易感性及其机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

Augmentor or Filter? Reconsider the Role of Pre-trained Language Model in Text Classification Augmentation

Arxiv

0+阅读 · 2022年10月6日

WakeUpNet: A Mobile-Transformer based Framework for End-to-End Streaming Voice Trigger

Arxiv

0+阅读 · 2022年10月6日

Toward Knowledge-Driven Speech-Based Models of Depression: Leveraging Spectrotemporal Variations in Speech Vowels

Arxiv

0+阅读 · 2022年10月5日

CAST: Concurrent Recognition and Segmentation with Adaptive Segment Tokens

Arxiv

0+阅读 · 2022年10月4日

SwarMan: Anthropomorphic Swarm of Drones Avatar with Body Tracking and Deep Learning-Based Gesture Recognition

Arxiv

0+阅读 · 2022年10月4日

Combinatorial and algebraic perspectives on the marginal independence structure of Bayesian networks

Arxiv

0+阅读 · 2022年10月3日

Efficient acoustic feature transformation in mismatched environments using a Guided-GAN

Arxiv

0+阅读 · 2022年10月3日

Concurrent Recognition and Segmentation with Adaptive Segment Tokens

Arxiv

0+阅读 · 2022年10月1日

Husformer: A Multi-Modal Transformer for Multi-Modal Human State Recognition

Arxiv

0+阅读 · 2022年9月30日

Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

Arxiv

0+阅读 · 2022年9月30日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

NeurIPS 2025 | 自动化所新作速览（一）

大型语言模型（LLM）赋能的知识图谱构建：综述

NeurIPS 2025 | 自动化所新作速览（二）

领域特定文本分类中的预训练语言模型新进展：系统综述

相关资讯

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

征稿 | CFP：Special Issue of NLP and KG(JCR Q2，IF2.67)

开放知识图谱

1+阅读 · 2022年4月4日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

相关论文

Augmentor or Filter? Reconsider the Role of Pre-trained Language Model in Text Classification Augmentation

Arxiv

0+阅读 · 2022年10月6日

WakeUpNet: A Mobile-Transformer based Framework for End-to-End Streaming Voice Trigger

Arxiv

0+阅读 · 2022年10月6日

Toward Knowledge-Driven Speech-Based Models of Depression: Leveraging Spectrotemporal Variations in Speech Vowels

Arxiv

0+阅读 · 2022年10月5日

CAST: Concurrent Recognition and Segmentation with Adaptive Segment Tokens

Arxiv

0+阅读 · 2022年10月4日

SwarMan: Anthropomorphic Swarm of Drones Avatar with Body Tracking and Deep Learning-Based Gesture Recognition

Arxiv

0+阅读 · 2022年10月4日

Combinatorial and algebraic perspectives on the marginal independence structure of Bayesian networks

Arxiv

0+阅读 · 2022年10月3日

Efficient acoustic feature transformation in mismatched environments using a Guided-GAN

Arxiv

0+阅读 · 2022年10月3日

Concurrent Recognition and Segmentation with Adaptive Segment Tokens

Arxiv

0+阅读 · 2022年10月1日

Husformer: A Multi-Modal Transformer for Multi-Modal Human State Recognition

Arxiv

0+阅读 · 2022年9月30日

Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

Arxiv

0+阅读 · 2022年9月30日

相关基金

野油菜黄单胞菌群体感应信号DSF生物合成途径和机理研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于组分分离的生物质高值化转化研究

国家自然科学基金

0+阅读 · 2014年12月31日

可再生能源电价机制形成机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

稀土元素对FeGa合金性能影响机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

Degasperis-Procesi方程若干控制问题的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于谐波效应补偿的双三相永磁同步电动机两电机串联系统解耦控制方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Vensim技术的医疗机构病人安全文化评价与实证研究

国家自然科学基金

0+阅读 · 2012年12月31日

河口区DOM生物可利用性的时空演变及其调控机制

国家自然科学基金

0+阅读 · 2012年12月31日

脑动静脉畸形易感基因MMP-3启动子SNP对转录调控影响的机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

Dicer和Drosha基因遗传变异与膀胱癌易感性及其机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员