Transformer 语言模型的准确量化：基于等效和最优位移和缩放的异常值抑制+ (Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling) - 专知论文

会员服务 ·

0

缩放 · 等效 · 最优 · 变换 · 语言模型 ·

2023 年 4 月 18 日

Outlier Suppression+: Accurate quantization of large language models by equivalent and optimal shifting and scaling

翻译：Transformer 语言模型的准确量化：基于等效和最优位移和缩放的异常值抑制+

Xiuying Wei,Yunchen Zhang,Yuhang Li,Xiangguo Zhang,Ruihao Gong,Jinyang Guo,Xianglong Liu

Quantization of transformer language models faces significant challenges due to the existence of detrimental outliers in activations. We observe that these outliers are asymmetric and concentrated in specific channels. To address this issue, we propose the Outlier Suppression+ framework. First, we introduce channel-wise shifting and scaling operations to eliminate asymmetric presentation and scale down problematic channels. We demonstrate that these operations can be seamlessly migrated into subsequent modules while maintaining equivalence. Second, we quantitatively analyze the optimal values for shifting and scaling, taking into account both the asymmetric property and quantization errors of weights in the next layer. Our lightweight framework can incur minimal performance degradation under static and standard post-training quantization settings. Comprehensive results across various tasks and models reveal that our approach achieves near-floating-point performance on both small models, such as BERT, and large language models (LLMs) including OPTs, BLOOM, and BLOOMZ at 8-bit and 6-bit settings. Furthermore, we establish a new state of the art for 4-bit BERT.

翻译：摘要：Transformer 语言模型的量化面临着存在活性异常值的重大挑战。我们观察到这些异常值是不对称的，并且集中在特定通道中。为了解决这个问题，我们提出了异常值抑制+框架。首先，我们引入了通道级别的位移和缩放操作，以消除不对称展示，并缩小问题通道。我们证明这些操作可以无缝地迁移到后续模块中，同时保持等价性。其次，我们定量分析了位移和缩放的最优值，考虑了下一层权重的不对称性和量化误差。我们的轻量级框架可以在静态和标准的后训练量化设置下最小程度地降低性能。各种任务和模型的全面结果表明，我们的方法在 8 位和 6 位设置下实现了接近浮点性能，在小型模型（如 BERT）和大型语言模型（包括 OPT、BLOOM 和 BLOOMZ）上。此外，我们建立了新的 4 位 BERT 的最新状态。

0

相关内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

专知会员服务

21+阅读 · 2020年4月30日

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

专知会员服务

20+阅读 · 2020年4月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

泡泡机器人SLAM

10+阅读 · 2019年4月26日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

Underlay频谱共享方式下信号参数估计和调制识别的方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于混合约束正则化的电阻抗成像反演研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于动态匹配EIV模型的星载波模式SAR涌浪方向谱误差分析方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于陀螺振动探测和编码曝光的遥感图像快速恢复方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

构件软件的回归测试及复杂性度量研究

国家自然科学基金

1+阅读 · 2013年12月31日

带跳扩散模型的非参数统计推断研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于ECT/EST双模成像的滑油在线监测方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

光晶格中含有自旋轨道耦合的超冷原子的新奇量子效应及调控

国家自然科学基金

0+阅读 · 2011年12月31日

Petri网可重写理论及在服务组合中的应用

国家自然科学基金

0+阅读 · 2009年12月31日

抗非均匀反射扰动复色差动超分辨共焦显微传感机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization

Arxiv

0+阅读 · 2023年6月2日

Counting Crowds in Bad Weather

Arxiv

0+阅读 · 2023年6月2日

Joint Adaptive Representations for Image-Language Learning

Arxiv

0+阅读 · 2023年6月1日

On the Identifiability of Nonlinear ICA: Sparsity and Beyond

Arxiv

0+阅读 · 2023年6月1日

Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing

Arxiv

0+阅读 · 2023年5月31日

Integrated multi-operand optical neurons for scalable and hardware-efficient deep learning

Arxiv

0+阅读 · 2023年5月31日

FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting

Arxiv

10+阅读 · 2022年5月16日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

50+阅读 · 2021年1月6日

Label-aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition

Arxiv

10+阅读 · 2018年4月28日

VIP会员

文章信息

相关主题

相关VIP内容

【2022新书】高效深度学习，Efficient Deep Learning Book

【2022新书】高效深度学习，Efficient Deep Learning Book

专知会员服务

125+阅读 · 2022年4月21日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

【ACL2020】DeeBERT:动态加速BERT推理，DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

专知会员服务

21+阅读 · 2020年4月30日

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

【IJCAI2020】统计相关模型，A Complete Characterization of Projectivity for Statistical Relational Models

专知会员服务

20+阅读 · 2020年4月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

NeurIPS 2025 | 自动化所新作速览（一）

大型语言模型（LLM）赋能的知识图谱构建：综述

NeurIPS 2025 | 自动化所新作速览（二）

领域特定文本分类中的预训练语言模型新进展：系统综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

泡泡机器人SLAM

10+阅读 · 2019年4月26日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization

Arxiv

0+阅读 · 2023年6月2日

Counting Crowds in Bad Weather

Arxiv

0+阅读 · 2023年6月2日

Joint Adaptive Representations for Image-Language Learning

Arxiv

0+阅读 · 2023年6月1日

On the Identifiability of Nonlinear ICA: Sparsity and Beyond

Arxiv

0+阅读 · 2023年6月1日

Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing

Arxiv

0+阅读 · 2023年5月31日

Integrated multi-operand optical neurons for scalable and hardware-efficient deep learning

Arxiv

0+阅读 · 2023年5月31日

FEDformer: Frequency Enhanced Decomposed Transformer for Long-term Series Forecasting

Arxiv

10+阅读 · 2022年5月16日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

Adaptive Synthetic Characters for Military Training

Adaptive Synthetic Characters for Military Training

Arxiv

50+阅读 · 2021年1月6日

Label-aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition

Arxiv

10+阅读 · 2018年4月28日

相关基金

Underlay频谱共享方式下信号参数估计和调制识别的方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于混合约束正则化的电阻抗成像反演研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于动态匹配EIV模型的星载波模式SAR涌浪方向谱误差分析方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于陀螺振动探测和编码曝光的遥感图像快速恢复方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

构件软件的回归测试及复杂性度量研究

国家自然科学基金

1+阅读 · 2013年12月31日

带跳扩散模型的非参数统计推断研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于ECT/EST双模成像的滑油在线监测方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

光晶格中含有自旋轨道耦合的超冷原子的新奇量子效应及调控

国家自然科学基金

0+阅读 · 2011年12月31日

Petri网可重写理论及在服务组合中的应用

国家自然科学基金

0+阅读 · 2009年12月31日

抗非均匀反射扰动复色差动超分辨共焦显微传感机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员