语音识别变换语言模型混合精度 (Mixed Precision of Quantization of Transformer Language Models for Speech Recognition) - 专知论文

会员服务 ·

0

查准率/准确率 · 语言模型化 · 变换 · Performer · MoDELS ·

2021 年 11 月 29 日

Mixed Precision of Quantization of Transformer Language Models for Speech Recognition

翻译：语音识别变换语言模型混合精度

Junhao Xu,Shoukang Hu,Jianwei Yu,Xunying Liu,Helen Meng

from arxiv, arXiv admin note: substantial text overlap with arXiv:2112.11438, arXiv:2111.14479

State-of-the-art neural language models represented by Transformers are becoming increasingly complex and expensive for practical applications. Low-bit deep neural network quantization techniques provides a powerful solution to dramatically reduce their model size. Current low-bit quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different parts of the system to quantization errors. To this end, novel mixed precision DNN quantization methods are proposed in this paper. The optimal local precision settings are automatically learned using two techniques. The first is based on a quantization sensitivity metric in the form of Hessian trace weighted quantization perturbation. The second is based on mixed precision Transformer architecture search. Alternating direction methods of multipliers (ADMM) are used to efficiently train mixed precision quantized DNN systems. Experiments conducted on Penn Treebank (PTB) and a Switchboard corpus trained LF-MMI TDNN system suggest the proposed mixed precision Transformer quantization techniques achieved model size compression ratios of up to 16 times over the full precision baseline with no recognition performance degradation. When being used to compress a larger full precision Transformer LM with more layers, overall word error rate (WER) reductions up to 1.7% absolute (18% relative) were obtained.

翻译：由变异器所代表的最新神经语言模型正在变得越来越复杂,而且实际应用成本越来越昂贵。低位深神经网络量化技术为大幅缩小模型大小提供了强有力的解决方案。当前低位量化方法基于统一精确度,没有考虑到系统不同部分不同性能敏感度的量化误差。为此,本文件提出了新颖的混合精密 DNNN 量化方法。使用两种技术自动学习了最佳本地精确设置。第一种是赫森痕量加权四分化的量化灵敏度指标。第二种是混合精密变异器结构搜索法。变异法乘数法(ADMMM)用于高效地训练混合精度四分解 DNNN系统。在Pen Treebank(PTB)和经过LF-MI TDNN系统培训的开关板堆实验中,建议采用混合精度变异度变异器量化技术,在完全精度基线下达到16倍的压缩比例,而没有识别性降解。在使用最大精度缩度降级比例时,将精确度降为1.7%。

0

相关内容

查准率/准确率

查准率/准确率

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

原发性肝癌精确放疗致乙肝病毒再激活预测模型的建立

国家自然科学基金

0+阅读 · 2013年12月31日

基于ARM架构的移动智能终端的Cache计时攻击技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

种子优化算法及其在动态优化问题求解中的研究

国家自然科学基金

0+阅读 · 2012年12月31日

时空多分辨率遥感影像融合的模型与方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

车载激光扫描点云与全景影像的高精度配准方法

国家自然科学基金

0+阅读 · 2012年12月31日

快速K-S变换理论与谐波时频参数自适应分析关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

调肝降脂方对CYP7A1信号通路调控作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

TRPC和ORAI1协同构成钙池操纵的钙通道(SOC)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于FPGA+ARM的电力谐波检测方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

On the Performance Evaluation of Action Recognition Models on Transcoded Low Quality Videos

Arxiv

0+阅读 · 2022年4月19日

Self-Calibrated Efficient Transformer for Lightweight Super-Resolution

Arxiv

1+阅读 · 2022年4月19日

Impact of Tokenization on Language Models: An Analysis for Turkish

Arxiv

0+阅读 · 2022年4月19日

Subset selection for linear mixed models

Arxiv

1+阅读 · 2022年4月18日

Exploring Dimensionality Reduction Techniques in Multilingual Transformers

Arxiv

0+阅读 · 2022年4月18日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

VIP会员

文章信息

相关主题

查准率/准确率

语言模型化

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

320+阅读 · 2020年11月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】扩展可扩展会话推荐的边界

别想太多：高效 R1 风格大型推理模型综述

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

智能体网络：用AI智能体编织下一代网络

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

On the Performance Evaluation of Action Recognition Models on Transcoded Low Quality Videos

Arxiv

0+阅读 · 2022年4月19日

Self-Calibrated Efficient Transformer for Lightweight Super-Resolution

Arxiv

1+阅读 · 2022年4月19日

Impact of Tokenization on Language Models: An Analysis for Turkish

Arxiv

0+阅读 · 2022年4月19日

Subset selection for linear mixed models

Arxiv

1+阅读 · 2022年4月18日

Exploring Dimensionality Reduction Techniques in Multilingual Transformers

Arxiv

0+阅读 · 2022年4月18日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Transformer Tracking

Arxiv

17+阅读 · 2021年3月29日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Meta Learning for End-to-End Low-Resource Speech Recognition

Meta Learning for End-to-End Low-Resource Speech Recognition

Arxiv

20+阅读 · 2019年10月26日

相关基金

原发性肝癌精确放疗致乙肝病毒再激活预测模型的建立

国家自然科学基金

0+阅读 · 2013年12月31日

基于ARM架构的移动智能终端的Cache计时攻击技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向GPU的电力系统电磁暂态并行计算方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

种子优化算法及其在动态优化问题求解中的研究

国家自然科学基金

0+阅读 · 2012年12月31日

时空多分辨率遥感影像融合的模型与方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

车载激光扫描点云与全景影像的高精度配准方法

国家自然科学基金

0+阅读 · 2012年12月31日

快速K-S变换理论与谐波时频参数自适应分析关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

调肝降脂方对CYP7A1信号通路调控作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

TRPC和ORAI1协同构成钙池操纵的钙通道(SOC)的研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于FPGA+ARM的电力谐波检测方法研究

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员