Uconv-Connex: 大量减少终端至终端语音识别输入序列长度 (Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition) - 专知论文

会员服务 ·

0

Conformer · 推断 · 语音识别 · MoDELS · 端到端 ·

2022 年 10 月 3 日

Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition

翻译：Uconv-Connex: 大量减少终端至终端语音识别输入序列长度

Andrei Andrusenko,Rauf Nasretdinov,Aleksei Romanenko

from arxiv, 5 pages, 1 figure

Optimization of modern ASR architectures is among the highest priority tasks since it saves many computational resources for model training and inference. The work proposes a new Uconv-Conformer architecture based on the standard Conformer model. It consistently reduces the input sequence length by 16 times, which results in speeding up the work of the intermediate layers. To solve the convergence issue connected with such a significant reduction of the time dimension, we use upsampling blocks like in the U-Net architecture to ensure the correct CTC loss calculation and stabilize network training. The Uconv-Conformer architecture appears to be not only faster in terms of training and inference speed but also shows better WER compared to the baseline Conformer. Our best Uconv-Conformer model shows 47.8% and 23.5% inference acceleration on the CPU and GPU, respectively. Relative WER reduction is 7.3% and 9.2% on LibriSpeech test_clean and test_other respectively.

翻译：优化现代 ASR 架构是最优先的任务之一,因为它为模型培训和推断节省了许多计算资源。工作提议基于标准 Conv 模式建立一个新的 Uconv- Connecter 架构。它一贯将输入序列长度减少16次, 从而加快中间层的工作。要解决与大量减少时间维度相关的趋同问题, 我们使用U-Net 架构中的抽查块来确保正确的计算损失计算和稳定网络培训。 Uconv- Connecter 架构不仅在培训和推断速度方面速度更快,而且显示与基线 Confer 相比, WER也更好。我们最好的 Uconv- Connecter 模型分别显示在 CPU 和 GPU 上加速了47.8% 和23.5% 。在 LibriSpeech 测试- clean and test_ other上, 相对的WER降幅分别为7.3%和9.2%。

0

相关内容

Conformer

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

高功率有序纳米孔富锂层状氧化物正极材料制备及其电极动力学研究

国家自然科学基金

0+阅读 · 2014年12月31日

prohibitin与PIG3基因启动子区（TGYCC）n序列结合并调控其转录的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

碳纳米管TSV建模、热特性及电磁特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

MeCP2在增龄性EPCs功能障碍中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

肝细胞肝癌中高表达的PRC1基因功能及其受CTCF调控的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Er:Glass NPRO激光强度噪声的全量子理论分析与实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

微囊藻毒素生物降解过程mlr基因功能和mRNA转录水平响应机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩流动的二阶精度大时间步长、高分辨率差分格式研究及其验证

国家自然科学基金

0+阅读 · 2012年12月31日

鸭疫里默氏杆菌整合子对其捕获和表达耐药基因盒效率的调控作用

国家自然科学基金

0+阅读 · 2012年12月31日

能量临界情形的非线性Schrodinger方程

国家自然科学基金

0+阅读 · 2011年12月31日

Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts

Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts

Arxiv

0+阅读 · 2022年11月7日

Hi,KIA: A Speech Emotion Recognition Dataset for Wake-Up Words

Arxiv

0+阅读 · 2022年11月7日

Biased Self-supervised learning for ASR

Arxiv

0+阅读 · 2022年11月4日

Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition

Arxiv

0+阅读 · 2022年11月4日

Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization

Arxiv

0+阅读 · 2022年11月3日

Hybrid-SD ($\text{H}_{\text{SD}}$) : A new hybrid evaluation metric for automatic speech recognition tasks

Arxiv

0+阅读 · 2022年11月3日

Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

Arxiv

0+阅读 · 2022年11月3日

Towards Zero-Shot Code-Switched Speech Recognition

Arxiv

0+阅读 · 2022年11月2日

Hybrid Curriculum Learning for Emotion Recognition in Conversation

Arxiv

14+阅读 · 2021年12月22日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《巡飞弹药（爆炸性无人机）威胁态势分析》最新24页报告

《军用后勤无人机：破解战场运输挑战的创新方案》

人工智能战争：以色列、伊朗与新型AI战争形态

《俄乌战争：现代战争未来的启示与经验》

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

相关论文

Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts

Shapes of Emotions: Multimodal Emotion Recognition in Conversations via Emotion Shifts

Arxiv

0+阅读 · 2022年11月7日

Hi,KIA: A Speech Emotion Recognition Dataset for Wake-Up Words

Arxiv

0+阅读 · 2022年11月7日

Biased Self-supervised learning for ASR

Arxiv

0+阅读 · 2022年11月4日

Minimum Latency Training of Sequence Transducers for Streaming End-to-End Speech Recognition

Arxiv

0+阅读 · 2022年11月4日

Adaptive Stochastic Variance Reduction for Non-convex Finite-Sum Minimization

Arxiv

0+阅读 · 2022年11月3日

Hybrid-SD ($\text{H}_{\text{SD}}$) : A new hybrid evaluation metric for automatic speech recognition tasks

Arxiv

0+阅读 · 2022年11月3日

Phonetic-assisted Multi-Target Units Modeling for Improving Conformer-Transducer ASR system

Arxiv

0+阅读 · 2022年11月3日

Towards Zero-Shot Code-Switched Speech Recognition

Arxiv

0+阅读 · 2022年11月2日

Hybrid Curriculum Learning for Emotion Recognition in Conversation

Arxiv

14+阅读 · 2021年12月22日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

相关基金

高功率有序纳米孔富锂层状氧化物正极材料制备及其电极动力学研究

国家自然科学基金

0+阅读 · 2014年12月31日

prohibitin与PIG3基因启动子区（TGYCC）n序列结合并调控其转录的分子机制

国家自然科学基金

0+阅读 · 2014年12月31日

碳纳米管TSV建模、热特性及电磁特性研究

国家自然科学基金

0+阅读 · 2014年12月31日

MeCP2在增龄性EPCs功能障碍中的作用及机制

国家自然科学基金

0+阅读 · 2014年12月31日

肝细胞肝癌中高表达的PRC1基因功能及其受CTCF调控的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Er:Glass NPRO激光强度噪声的全量子理论分析与实验研究

国家自然科学基金

0+阅读 · 2013年12月31日

微囊藻毒素生物降解过程mlr基因功能和mRNA转录水平响应机制的研究

国家自然科学基金

0+阅读 · 2013年12月31日

可压缩流动的二阶精度大时间步长、高分辨率差分格式研究及其验证

国家自然科学基金

0+阅读 · 2012年12月31日

鸭疫里默氏杆菌整合子对其捕获和表达耐药基因盒效率的调控作用

国家自然科学基金

0+阅读 · 2012年12月31日

能量临界情形的非线性Schrodinger方程

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员