SuAT:为BERT提供锐化和量化-软件培训 (SQuAT: Sharpness- and Quantization-Aware Training for BERT) - 专知论文

会员服务 ·

0

极小值 · MoDELS · Performer · BERT · Learning ·

2022 年 10 月 13 日

SQuAT: Sharpness- and Quantization-Aware Training for BERT

翻译：SuAT:为BERT提供锐化和量化-软件培训

Zheng Wang,Juncheng B Li,Shuhui Qu,Florian Metze,Emma Strubell

Quantization is an effective technique to reduce memory footprint, inference latency, and power consumption of deep learning models. However, existing quantization methods suffer from accuracy degradation compared to full-precision (FP) models due to the errors introduced by coarse gradient estimation through non-differentiable quantization layers. The existence of sharp local minima in the loss landscapes of overparameterized models (e.g., Transformers) tends to aggravate such performance penalty in low-bit (2, 4 bits) settings. In this work, we propose sharpness- and quantization-aware training (SQuAT), which would encourage the model to converge to flatter minima while performing quantization-aware training. Our proposed method alternates training between sharpness objective and step-size objective, which could potentially let the model learn the most suitable parameter update magnitude to reach convergence near-flat minima. Extensive experiments show that our method can consistently outperform state-of-the-art quantized BERT models under 2, 3, and 4-bit settings on GLUE benchmarks by 1%, and can sometimes even outperform full precision (32-bit) models. Our experiments on empirical measurement of sharpness also suggest that our method would lead to flatter minima compared to other quantization methods.

翻译：量化是减少记忆足迹、推推延率和深层学习模型动力消耗的有效方法。然而,现有量化方法与全精度模型相比,精确度下降,因为粗粗梯度估算通过非差别量化层引入错误。超分模型(如变压器)损失地貌中存在尖锐的地方微粒,往往会加重低位(2、4比特)环境中的这种性能处罚。在这项工作中,我们建议精确度和定量认知模型培训(SQAAT),这将鼓励该模型在进行四分法培训时与微米相融合。我们提议的方法替代了精确度目标与跨级目标之间的培训。这有可能让模型学习最合适的参数更新程度,以达到接近缩放微型模型的趋同量。广泛的实验表明,我们的方法在2、3和4位模型下,GLUE基准下的精度和微量度(SQUE基准)会鼓励该模型在进行微量度培训时与微量度模型相趋近,有时甚至可以建议我们最精确的精确度测量方法。

0

相关内容

极小值

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

Omi/HtrA2在运动性骨骼肌损伤中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

偕二氟取代Combretastatins衍生物的设计与合成

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于单语语料的无监督统计机器翻译模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

牛虻唾液腺中一种新型免疫抑制肽结构，功能及作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Compressive sensing理论的单探测器太赫兹成像技术

国家自然科学基金

0+阅读 · 2009年12月31日

sRAGE对缺血/再灌注的心脏保护作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

Training Language Models with Language Feedback

Arxiv

0+阅读 · 2022年11月17日

Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training

Arxiv

1+阅读 · 2022年11月16日

An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems

Arxiv

0+阅读 · 2022年11月15日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Composite Adversarial Attacks

Arxiv

12+阅读 · 2020年12月10日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Training Language Models with Language Feedback

Arxiv

0+阅读 · 2022年11月17日

Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training

Arxiv

1+阅读 · 2022年11月16日

An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems

Arxiv

0+阅读 · 2022年11月15日

Neural Architecture Search without Training

Neural Architecture Search without Training

Arxiv

10+阅读 · 2021年6月11日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

The Causal Learning of Retail Delinquency

Arxiv

14+阅读 · 2020年12月17日

Composite Adversarial Attacks

Arxiv

12+阅读 · 2020年12月10日

Pretrained Transformers for Text Ranking: BERT and Beyond

Arxiv

28+阅读 · 2020年10月13日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Learning Discrete Structures for Graph Neural Networks

Arxiv

17+阅读 · 2019年3月28日

相关基金

Omi/HtrA2在运动性骨骼肌损伤中的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

偕二氟取代Combretastatins衍生物的设计与合成

国家自然科学基金

0+阅读 · 2014年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

基于单语语料的无监督统计机器翻译模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

靶向微管蛋白秋水仙碱位点的白藜芦醇-Combrestatin A-4类抑制剂的设计、合成及活性研究

国家自然科学基金

0+阅读 · 2013年12月31日

Schrodinger-Poisson方程的若干问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

Intraflagellar Transport运输纤毛蛋白的分子机理

国家自然科学基金

0+阅读 · 2012年12月31日

牛虻唾液腺中一种新型免疫抑制肽结构，功能及作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Compressive sensing理论的单探测器太赫兹成像技术

国家自然科学基金

0+阅读 · 2009年12月31日

sRAGE对缺血/再灌注的心脏保护作用及其机制

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员