Uconv-Connex: 大量减少终端至终端语音识别输入序列长度 (Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition) - 专知论文

会员服务 ·

0

Conformer · 推断 · 语音识别 · 可约的 · MoDELS ·

2022 年 8 月 16 日

Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition

翻译：Uconv-Connex: 大量减少终端至终端语音识别输入序列长度

Andrei Andrusenko,Rauf Nasretdinov,Aleksei Romanenko

from arxiv, 5 pages, 1 figure

Optimization of modern ASR architectures is among the highest priority tasks since it saves many computational resources for model training and inference. The work proposes a new Uconv-Conformer architecture based on the standard Conformer model that consistently reduces the input sequence length by 16 times, which results in speeding up the work of the intermediate layers. To solve the convergence problem with such a significant reduction of the time dimension, we use upsampling blocks similar to the U-Net architecture to ensure the correct CTC loss calculation and stabilize network training. The Uconv-Conformer architecture appears to be not only faster in terms of training and inference but also shows better WER compared to the baseline Conformer. Our best Uconv-Conformer model showed 40.3% epoch training time reduction, 47.8%, and 23.5% inference acceleration on the CPU and GPU, respectively. Relative WER on Librispeech test_clean and test_other decreased by 7.3% and 9.2%.

翻译：优化现代 ASR 架构是最优先的任务之一,因为它为模型培训和推论节省了许多计算资源。工作提议基于标准 Conv 模式建立一个新的 Uconv- Connecter 架构, 不断将输入序列长度缩短16次, 从而加速中间层的工作。要解决趋同问题, 大量减少时间, 我们使用类似于 U-Net 架构的抽查块来确保正确的计算和稳定网络损失。 Uconv- Conferent 架构在培训和推论方面似乎不仅速度更快,而且显示与基准 Conzer 相比WER效果更好。我们的最佳 Uconv- Consecter 模型显示, CPU 和 GPU 的加速度分别为40.3%、 47.8% 和 23.5% 。 Librispeech 测试和测试的相对WER 降低7.3% 和 9.2% 。

0

相关内容

Conformer

【CVPR 2022】C2AM损失：为长尾目标检测任务追求更好的决策边界，C2AM Loss: Chasing a Better Decision Boundary for Long-Tail Object Detection

【CVPR 2022】C2AM损失：为长尾目标检测任务追求更好的决策边界，C2AM Loss: Chasing a Better Decision Boundary for Long-Tail Object Detection

专知会员服务

7+阅读 · 2022年3月19日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

MicroRNA-139-5p调控心肌重构的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

长链非编码RNA-Zfas1在心肌梗死中的调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

固氮施氏假单胞菌非编码RNA crcZ和crcY在碳代谢抑制中的协同作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

Sfrp5抑制BTPs活性在脂肪细胞分化中的作用及转录调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

水莱茵海默氏菌 (Rheinheimera aquimaris) 淬灭细菌群体感应的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Catestatin蛋白肽段抑制动脉粥样硬化的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

lincRNA-ETS1-2上调癌基因ETS-1表达促进雄激素非依赖性前列腺癌演进的机制

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin抑制糖脂毒性诱导的心肌胰岛素抵抗的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

遗传性LCAT缺陷症抗动脉粥样硬化发生的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

多频毫米波产生及在多基站RoF网络中应用的理论与关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

IoU-Enhanced Attention for End-to-End Task Specific Object Detection

Arxiv

0+阅读 · 2022年10月5日

Finite-Blocklength Results for the A-channel: Applications to Unsourced Random Access and Group Testing

Arxiv

0+阅读 · 2022年10月4日

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

Arxiv

0+阅读 · 2022年10月4日

Automatic Speech Recognition for Speech Assessment of Persian Preschool Children

Arxiv

0+阅读 · 2022年10月1日

Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition

Arxiv

0+阅读 · 2022年10月1日

Towards End-to-end Handwritten Document Recognition

Arxiv

0+阅读 · 2022年9月30日

Deep Recurrent Encoder: A scalable end-to-end network to model brain signals

Arxiv

0+阅读 · 2022年9月30日

Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

Arxiv

0+阅读 · 2022年9月30日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Attention U-Net: Learning Where to Look for the Pancreas

Arxiv

17+阅读 · 2018年5月20日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】C2AM损失：为长尾目标检测任务追求更好的决策边界，C2AM Loss: Chasing a Better Decision Boundary for Long-Tail Object Detection

【CVPR 2022】C2AM损失：为长尾目标检测任务追求更好的决策边界，C2AM Loss: Chasing a Better Decision Boundary for Long-Tail Object Detection

专知会员服务

7+阅读 · 2022年3月19日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

深度自进化聚类：Deep Self-Evolution Clustering

深度自进化聚类：Deep Self-Evolution Clustering

我爱读PAMI

15+阅读 · 2019年4月13日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】MXNet深度情感分析实战

【推荐】MXNet深度情感分析实战

机器学习研究会

16+阅读 · 2017年10月4日

相关论文

IoU-Enhanced Attention for End-to-End Task Specific Object Detection

Arxiv

0+阅读 · 2022年10月5日

Finite-Blocklength Results for the A-channel: Applications to Unsourced Random Access and Group Testing

Arxiv

0+阅读 · 2022年10月4日

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

Arxiv

0+阅读 · 2022年10月4日

Automatic Speech Recognition for Speech Assessment of Persian Preschool Children

Arxiv

0+阅读 · 2022年10月1日

Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition

Arxiv

0+阅读 · 2022年10月1日

Towards End-to-end Handwritten Document Recognition

Arxiv

0+阅读 · 2022年9月30日

Deep Recurrent Encoder: A scalable end-to-end network to model brain signals

Arxiv

0+阅读 · 2022年9月30日

Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

Arxiv

0+阅读 · 2022年9月30日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

Attention U-Net: Learning Where to Look for the Pancreas

Arxiv

17+阅读 · 2018年5月20日

相关基金

MicroRNA-139-5p调控心肌重构的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

长链非编码RNA-Zfas1在心肌梗死中的调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

固氮施氏假单胞菌非编码RNA crcZ和crcY在碳代谢抑制中的协同作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

Sfrp5抑制BTPs活性在脂肪细胞分化中的作用及转录调控机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

水莱茵海默氏菌 (Rheinheimera aquimaris) 淬灭细菌群体感应的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Catestatin蛋白肽段抑制动脉粥样硬化的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

lincRNA-ETS1-2上调癌基因ETS-1表达促进雄激素非依赖性前列腺癌演进的机制

国家自然科学基金

0+阅读 · 2012年12月31日

Ghrelin抑制糖脂毒性诱导的心肌胰岛素抵抗的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

遗传性LCAT缺陷症抗动脉粥样硬化发生的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

多频毫米波产生及在多基站RoF网络中应用的理论与关键技术研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员