Simplifying the Transformer网络——基于多路复用技术的语音识别 (Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition) - 专知论文

会员服务 ·

0

多路复用技术 · 复用技术 · 基线 · Transformer · 语音识别 ·

2023 年 4 月 11 日

Sim-T: Simplify the Transformer Network by Multiplexing Technique for Speech Recognition

翻译：Simplifying the Transformer网络——基于多路复用技术的语音识别

Guangyong Wei,Zhikui Duan,Shiren Li,Guangguang Yang,Xinmei Yu,Junhua Li

In recent years, a great deal of attention has been paid to the Transformer network for speech recognition tasks due to its excellent model performance. However, the Transformer network always involves heavy computation and large number of parameters, causing serious deployment problems in devices with limited computation sources or storage memory. In this paper, a new lightweight model called Sim-T has been proposed to expand the generality of the Transformer model. Under the help of the newly developed multiplexing technique, the Sim-T can efficiently compress the model with negligible sacrifice on its performance. To be more precise, the proposed technique includes two parts, that are, module weight multiplexing and attention score multiplexing. Moreover, a novel decoder structure has been proposed to facilitate the attention score multiplexing. Extensive experiments have been conducted to validate the effectiveness of Sim-T. In Aishell-1 dataset, when the proposed Sim-T is 48% parameter less than the baseline Transformer, 0.4% CER improvement can be obtained. Alternatively, 69% parameter reduction can be achieved if the Sim-T gives the same performance as the baseline Transformer. With regard to the HKUST and WSJ eval92 datasets, CER and WER will be improved by 0.3% and 0.2%, respectively, when parameters in Sim-T are 40% less than the baseline Transformer.

翻译：近年来，Transformer网络在语音识别任务中的出色性能备受关注。然而，Transformer网络通常需要大量计算和参数，导致在计算资源或存储内存有限的设备上存在严重的部署问题。因此，在本文中，我们提出了一个名为Sim-T的新型轻量级模型，以扩展Transformer模型的通用性。借助新开发的多路复用技术，Sim-T可以高效地压缩模型，并在性能上几乎不会做出任何牺牲。具体而言，所提出的技术包括模块权重多路复用和注意力分数多路复用两部分。此外，为促进注意力分数多路复用，还提出了一种新的解码器结构。我们进行了广泛的实验，以验证Sim-T的有效性。在Aishell-1数据集中，当提出的Sim-T的参数比基线Transformer低48％时，可以获得0.4％的CER改进。另外，如果Sim-T的性能与基线Transformer相同，则可以实现69％的参数减少。对于HKUST和WSJ eval92数据集，当Sim-T的参数比基线Transformer低40％时，CER和WER分别提高了0.3％和0.2％。

0

相关内容

多路复用技术

多路复用技术

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

专知会员服务

16+阅读 · 2022年4月8日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

专知会员服务

26+阅读 · 2020年5月2日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北航】深度学习编译器综述|The Deep Learning Compiler: A Comprehensive Survey

【北航】深度学习编译器综述|The Deep Learning Compiler: A Comprehensive Survey

专知会员服务

38+阅读 · 2020年2月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

泡泡机器人SLAM

25+阅读 · 2019年1月17日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

专知

33+阅读 · 2018年4月23日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

零辅助数据MIMO雷达自适应检测问题研究

国家自然科学基金

7+阅读 · 2015年12月31日

面向无线多媒体传感器网络的高效压缩视频感知

国家自然科学基金

0+阅读 · 2015年12月31日

不相交QoS路径与斯坦纳网络的近似算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于深度神经网络的噪声鲁棒性语音识别方法研究

国家自然科学基金

4+阅读 · 2013年12月31日

2.13微米脉冲单频激光器泵浦的窄线宽中红外ZGP-OPO研究

国家自然科学基金

0+阅读 · 2013年12月31日

语音识别中的稀疏性深度学习

国家自然科学基金

11+阅读 · 2012年12月31日

基于压缩网络编码的无线传感网隐私与安全机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

具有自主产权的安诚嵌入式处理器上支持AES及GF(2^n)运算的指令扩展结构研究

国家自然科学基金

0+阅读 · 2012年12月31日

无线传感器网络隐藏节点研究

国家自然科学基金

0+阅读 · 2012年12月31日

缓存技术在无线传感器网络中的研究与应用

国家自然科学基金

0+阅读 · 2009年12月31日

TReR: A Lightweight Transformer Re-Ranking Approach for 3D LiDAR Place Recognition

Arxiv

0+阅读 · 2023年5月29日

Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network

Arxiv

0+阅读 · 2023年5月29日

ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition

Arxiv

0+阅读 · 2023年5月28日

PowerGAN: A Machine Learning Approach for Power Side-Channel Attack on Compute-in-Memory Accelerators

Arxiv

0+阅读 · 2023年5月27日

Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition

Arxiv

0+阅读 · 2023年5月27日

Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person

Arxiv

0+阅读 · 2023年5月26日

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

Arxiv

0+阅读 · 2023年5月25日

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

Arxiv

0+阅读 · 2023年5月25日

Deep Generative Model for Simultaneous Range Error Mitigation and Environment Identification

Arxiv

0+阅读 · 2023年5月23日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

多路复用技术

相关VIP内容

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

【CVPR 2022】NUS&字节跳动提出Shunted Transformer：多尺度Token叠加

专知会员服务

16+阅读 · 2022年4月8日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

【2020关键词提取】使用多个本地功能从单个文档中提取关键字，YAKE! Keyword extraction from single documents using multiple local features

专知会员服务

26+阅读 · 2020年5月2日

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

【SIGMOD2020】知识图谱补全方法的现实再评价，Realistic Re-evaluation of Knowledge Graph Completion Methods: An Experimental Study

专知会员服务

33+阅读 · 2020年3月23日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【北航】深度学习编译器综述|The Deep Learning Compiler: A Comprehensive Survey

【北航】深度学习编译器综述|The Deep Learning Compiler: A Comprehensive Survey

专知会员服务

38+阅读 · 2020年2月11日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

【泡泡一分钟】基于运动估计的激光雷达和相机标定方法

泡泡机器人SLAM

25+阅读 · 2019年1月17日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

误差反向传播——CNN

误差反向传播——CNN

统计学习与视觉计算组

30+阅读 · 2018年7月12日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

【论文推荐】最新七篇图像分割相关论文—Attention U-Net、对抗结构匹配损失、卷积CRFs、对抗样本、弱监督分割

专知

19+阅读 · 2018年5月31日

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

专知

33+阅读 · 2018年4月23日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

【CNN】一文读懂卷积神经网络CNN

【CNN】一文读懂卷积神经网络CNN

产业智能官

18+阅读 · 2018年1月2日

【推荐】(Keras)LSTM多元时序预测教程

【推荐】(Keras)LSTM多元时序预测教程

机器学习研究会

24+阅读 · 2017年8月14日

相关论文

TReR: A Lightweight Transformer Re-Ranking Approach for 3D LiDAR Place Recognition

Arxiv

0+阅读 · 2023年5月29日

Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network

Arxiv

0+阅读 · 2023年5月29日

ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition

Arxiv

0+阅读 · 2023年5月28日

PowerGAN: A Machine Learning Approach for Power Side-Channel Attack on Compute-in-Memory Accelerators

Arxiv

0+阅读 · 2023年5月27日

Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition

Arxiv

0+阅读 · 2023年5月27日

Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person

Arxiv

0+阅读 · 2023年5月26日

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

Arxiv

0+阅读 · 2023年5月25日

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

Arxiv

0+阅读 · 2023年5月25日

Deep Generative Model for Simultaneous Range Error Mitigation and Environment Identification

Arxiv

0+阅读 · 2023年5月23日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

相关基金

零辅助数据MIMO雷达自适应检测问题研究

国家自然科学基金

7+阅读 · 2015年12月31日

面向无线多媒体传感器网络的高效压缩视频感知

国家自然科学基金

0+阅读 · 2015年12月31日

不相交QoS路径与斯坦纳网络的近似算法研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于深度神经网络的噪声鲁棒性语音识别方法研究

国家自然科学基金

4+阅读 · 2013年12月31日

2.13微米脉冲单频激光器泵浦的窄线宽中红外ZGP-OPO研究

国家自然科学基金

0+阅读 · 2013年12月31日

语音识别中的稀疏性深度学习

国家自然科学基金

11+阅读 · 2012年12月31日

基于压缩网络编码的无线传感网隐私与安全机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

具有自主产权的安诚嵌入式处理器上支持AES及GF(2^n)运算的指令扩展结构研究

国家自然科学基金

0+阅读 · 2012年12月31日

无线传感器网络隐藏节点研究

国家自然科学基金

0+阅读 · 2012年12月31日

缓存技术在无线传感器网络中的研究与应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员