Uconv-Connex: 大量减少终端至终端语音识别输入序列长度</s> (Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition) - 专知论文

会员服务 ·

0

Conformer · 推断 · 语音识别 · MoDELS · 端到端 ·

2023 年 3 月 11 日

Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition

翻译：Uconv-Connex: 大量减少终端至终端语音识别输入序列长度

Andrei Andrusenko,Rauf Nasretdinov,Aleksei Romanenko

from arxiv, 5 pages, 1 figure, accepted by ICASSP 2023

Optimization of modern ASR architectures is among the highest priority tasks since it saves many computational resources for model training and inference. The work proposes a new Uconv-Conformer architecture based on the standard Conformer model. It consistently reduces the input sequence length by 16 times, which results in speeding up the work of the intermediate layers. To solve the convergence issue connected with such a significant reduction of the time dimension, we use upsampling blocks like in the U-Net architecture to ensure the correct CTC loss calculation and stabilize network training. The Uconv-Conformer architecture appears to be not only faster in terms of training and inference speed but also shows better WER compared to the baseline Conformer. Our best Uconv-Conformer model shows 47.8% and 23.5% inference acceleration on the CPU and GPU, respectively. Relative WER reduction is 7.3% and 9.2% on LibriSpeech test_clean and test_other respectively.

翻译：优化现代 ASR 架构是最优先的任务之一,因为它为模型培训和推断节省了许多计算资源。工作提议基于标准 Conv 模式建立一个新的 Uconv- Connecter 架构。它一贯将输入序列长度减少16次, 从而加快中间层的工作。要解决与大量减少时间维度相关的趋同问题, 我们使用U-Net 架构中的抽查块来确保正确的计算损失计算和稳定网络培训。 Uconv- Connecter 架构不仅在培训和推断速度方面速度更快,而且显示与基线 Confer 相比, WER也更好。我们最好的 Uconv- Connecter 模型分别显示在 CPU 和 GPU 上加速了47.8% 和23.5% 。在 LibriSpeech 测试- clean and test_ other上, 相对的WER降幅分别为7.3%和9.2%。</s>

0

相关内容

Conformer

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

回声干扰抑制中的自适应信号处理算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

类风湿关节炎虚、实证候分类的生物基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

纤维结构不良中破骨细胞过度激活的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

求解具有张量积结构系统的算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

可变带宽交换光网络的自适应机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

鲍鱼脏器多糖AHP-2的一级结构及胆囊收缩素分泌刺激活性机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

多重刺激响应的纤维素基接枝共聚物结构与响应性

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Erbin在细胞分裂周期中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

A novel semantic-functional approach for multiuser event-trigger communication

Arxiv

0+阅读 · 2023年5月3日

On the convergence of reduction-based and model-based methods in proof theory

Arxiv

0+阅读 · 2023年5月2日

Comments on "Channel Coding Rate in the Finite Blocklength Regime": On the Quadratic Decaying Property of the Information Rate Function

Arxiv

0+阅读 · 2023年4月30日

Sequential Predictive Two-Sample and Independence Testing

Arxiv

0+阅读 · 2023年4月29日

MASK-CNN-Transformer For Real-Time Multi-Label Weather Recognition

Arxiv

0+阅读 · 2023年4月28日

Differentiable Sensor Layouts for End-to-End Learning of Task-Specific Camera Parameters

Arxiv

0+阅读 · 2023年4月28日

DIAMANT: Dual Image-Attention Map Encoders For Medical Image Segmentation

Arxiv

0+阅读 · 2023年4月28日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

W-net: Bridged U-net for 2D Medical Image Segmentation

W-net: Bridged U-net for 2D Medical Image Segmentation

Arxiv

20+阅读 · 2018年7月12日

VIP会员

文章信息

相关主题

相关VIP内容

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

小规模训练指南：打造世界级大语言模型的关键方法

无人机编队飞行：复杂环境中作战的策略、挑战与应用

大模型APP，AI时代第一个爆款

从数据中心视角出发的高效大语言模型训练综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】全卷积语义分割综述

【推荐】全卷积语义分割综述

机器学习研究会

19+阅读 · 2017年8月31日

相关论文

A novel semantic-functional approach for multiuser event-trigger communication

Arxiv

0+阅读 · 2023年5月3日

On the convergence of reduction-based and model-based methods in proof theory

Arxiv

0+阅读 · 2023年5月2日

Comments on "Channel Coding Rate in the Finite Blocklength Regime": On the Quadratic Decaying Property of the Information Rate Function

Arxiv

0+阅读 · 2023年4月30日

Sequential Predictive Two-Sample and Independence Testing

Arxiv

0+阅读 · 2023年4月29日

MASK-CNN-Transformer For Real-Time Multi-Label Weather Recognition

Arxiv

0+阅读 · 2023年4月28日

Differentiable Sensor Layouts for End-to-End Learning of Task-Specific Camera Parameters

Arxiv

0+阅读 · 2023年4月28日

DIAMANT: Dual Image-Attention Map Encoders For Medical Image Segmentation

Arxiv

0+阅读 · 2023年4月28日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation

Arxiv

12+阅读 · 2018年9月27日

W-net: Bridged U-net for 2D Medical Image Segmentation

W-net: Bridged U-net for 2D Medical Image Segmentation

Arxiv

20+阅读 · 2018年7月12日

相关基金

回声干扰抑制中的自适应信号处理算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

类风湿关节炎虚、实证候分类的生物基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

纤维结构不良中破骨细胞过度激活的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

求解具有张量积结构系统的算法研究

国家自然科学基金

1+阅读 · 2012年12月31日

可变带宽交换光网络的自适应机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

云计算环境下数据中心的power capping关键问题研究

国家自然科学基金

0+阅读 · 2012年12月31日

鲍鱼脏器多糖AHP-2的一级结构及胆囊收缩素分泌刺激活性机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

多重刺激响应的纤维素基接枝共聚物结构与响应性

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Erbin在细胞分裂周期中的作用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员