LLM.int8 (): 缩放变换器的 8 位矩阵乘法 (LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale) - 专知论文

会员服务 ·

0

Performer · 语言模型化 · Attention · 变换 · INT8 ·

2022 年 8 月 15 日

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

翻译：LLM.int8 (): 缩放变换器的 8 位矩阵乘法

Tim Dettmers,Mike Lewis,Younes Belkada,Luke Zettlemoyer

from arxiv, Extended NeurIPS2022 submission

Large language models have been widely adopted but require significant GPU memory for inference. We develop a procedure for Int8 matrix multiplication for feed-forward and attention projection layers in transformers, which cut the memory needed for inference by half while retaining full precision performance. With our method, a 175B parameter 16/32-bit checkpoint can be loaded, converted to Int8, and used immediately without performance degradation. This is made possible by understanding and working around properties of highly systematic emergent features in transformer language models that dominate attention and transformer predictive performance. To cope with these features, we develop a two-part quantization procedure, LLM.int8(). We first use vector-wise quantization with separate normalization constants for each inner product in the matrix multiplication, to quantize most of the features. However, for the emergent outliers, we also include a new mixed-precision decomposition scheme, which isolates the outlier feature dimensions into a 16-bit matrix multiplication while still more than 99.9% of values are multiplied in 8-bit. Using LLM.int8(), we show empirically it is possible to perform inference in LLMs with up to 175B parameters without any performance degradation. This result makes such models much more accessible, for example making it possible to use OPT-175B/BLOOM on a single server with consumer GPUs.

翻译：大型语言模型已被广泛采用,但需要大量的 GPU 内存才能推断。我们为变压器的进料和注意力投影层开发了一个 Int8 矩阵倍增程序, 使变压器的进料和注意投影层的倍增程序, 在保留完全精确性能的同时, 将推算所需的内存减半。我们的方法是, 一个 175B 参数 16/32- 位检查站可以装入, 转换为 Int8, 并在不发生性能退化的情况下立即使用。这是通过理解和围绕变压器语言模型中高度系统性能的突现特征特性特性的特性开展工作而得以实现的。为了应对这些特性, 我们开发了一个两部分的定量化程序, LLM.int8 () 。我们首先使用向量化的矢量化四分解的量量化, 且在矩阵倍增的每个内产物中, 将每个内产物的正态常态常态常态常态常态常态常态常态常态常态化, 然而, G-B 将这种常态变型的常态变的常态常态常态常态常态常态常态常态常态常态常态变, 将使其在8( LLM8) 的变型常态常态常态常态常态常态常态常态常态常态常态常态性变变的变, 将使其在8 () 以不使用, 。

0

相关内容

Performer

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

428+阅读 · 2021年1月11日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

海外内推 | 新加坡科技研究局 (A*STAR) 高性能计算研究院招聘AI医疗方向研究员

海外内推 | 新加坡科技研究局 (A*STAR) 高性能计算研究院招聘AI医疗方向研究员

PaperWeekly

1+阅读 · 2022年3月31日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

全光开关用过渡金属Fe、Co、Ni量子点玻璃的制备及三阶非线性光学性质的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

固氮施氏假单胞菌非编码RNA crcZ和crcY在碳代谢抑制中的协同作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

新变指标Besov-Triebel-Lizorkin型函数空间及算子有界性

国家自然科学基金

0+阅读 · 2012年12月31日

基于Archimedean三角模的区间犹豫模糊平均型集结算子及其在决策中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

DNA手性自组装调控对映异构体选择性结晶研究

国家自然科学基金

0+阅读 · 2012年12月31日

牛磺酸对PUMA介导缺血再灌注心肌细胞凋亡的抑制作用

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

Bose-Hubbard模型量子相变的数值研究

国家自然科学基金

0+阅读 · 2011年12月31日

超导磁通型量子比特的耦合及绝热量子计算的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Towards the Multiple Constant Multiplication at Minimal Hardware Cost

Arxiv

0+阅读 · 2022年10月6日

A deep learning model for brain vessel segmentation in 3DRA with arteriovenous malformations

Arxiv

0+阅读 · 2022年10月5日

Joint Majorization-Minimization for Nonnegative Matrix Factorization with the $β$-divergence

Arxiv

0+阅读 · 2022年10月5日

Mobile Keystroke Biometrics Using Transformers

Arxiv

0+阅读 · 2022年10月4日

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

Arxiv

4+阅读 · 2022年10月3日

Sublinear Dynamic Interval Scheduling (on one or multiple machines)

Arxiv

0+阅读 · 2022年10月3日

SoK: On the Impossible Security of Very Large Foundation Models

Arxiv

0+阅读 · 2022年9月30日

Risk Control for Online Learning Models

Arxiv

0+阅读 · 2022年9月30日

I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

Arxiv

1+阅读 · 2022年9月30日

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

Arxiv

0+阅读 · 2022年9月29日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

不可错过！700+ppt《因果推理》课程！杜克大学Fan Li教程

专知会员服务

72+阅读 · 2022年7月11日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

MIT经典《线性代数》，584页pdf，Introduction to Linear Algebra, Fifth Edition, Gilbert Strang, 2016.

专知会员服务

428+阅读 · 2021年1月11日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

58+阅读 · 2020年1月25日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

海外内推 | 新加坡科技研究局 (A*STAR) 高性能计算研究院招聘AI医疗方向研究员

海外内推 | 新加坡科技研究局 (A*STAR) 高性能计算研究院招聘AI医疗方向研究员

PaperWeekly

1+阅读 · 2022年3月31日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

pytorch-pretrained-BERT：BERT PyTorch实现，可加载Google BERT预训练模型

AINLP

35+阅读 · 2018年11月6日

【推荐】用Tensorflow理解LSTM

【推荐】用Tensorflow理解LSTM

机器学习研究会

36+阅读 · 2017年9月11日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Towards the Multiple Constant Multiplication at Minimal Hardware Cost

Arxiv

0+阅读 · 2022年10月6日

A deep learning model for brain vessel segmentation in 3DRA with arteriovenous malformations

Arxiv

0+阅读 · 2022年10月5日

Joint Majorization-Minimization for Nonnegative Matrix Factorization with the $β$-divergence

Arxiv

0+阅读 · 2022年10月5日

Mobile Keystroke Biometrics Using Transformers

Arxiv

0+阅读 · 2022年10月4日

Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning

Arxiv

4+阅读 · 2022年10月3日

Sublinear Dynamic Interval Scheduling (on one or multiple machines)

Arxiv

0+阅读 · 2022年10月3日

SoK: On the Impossible Security of Very Large Foundation Models

Arxiv

0+阅读 · 2022年9月30日

Risk Control for Online Learning Models

Arxiv

0+阅读 · 2022年9月30日

I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

Arxiv

1+阅读 · 2022年9月30日

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

Arxiv

0+阅读 · 2022年9月29日

相关基金

全光开关用过渡金属Fe、Co、Ni量子点玻璃的制备及三阶非线性光学性质的研究

国家自然科学基金

0+阅读 · 2015年12月31日

Anderson型多酸的不对称修饰及可控组装研究

国家自然科学基金

1+阅读 · 2014年12月31日

固氮施氏假单胞菌非编码RNA crcZ和crcY在碳代谢抑制中的协同作用机制

国家自然科学基金

0+阅读 · 2013年12月31日

新变指标Besov-Triebel-Lizorkin型函数空间及算子有界性

国家自然科学基金

0+阅读 · 2012年12月31日

基于Archimedean三角模的区间犹豫模糊平均型集结算子及其在决策中的应用

国家自然科学基金

0+阅读 · 2012年12月31日

DNA手性自组装调控对映异构体选择性结晶研究

国家自然科学基金

0+阅读 · 2012年12月31日

牛磺酸对PUMA介导缺血再灌注心肌细胞凋亡的抑制作用

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

Bose-Hubbard模型量子相变的数值研究

国家自然科学基金

0+阅读 · 2011年12月31日

超导磁通型量子比特的耦合及绝热量子计算的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员