4比特精确度的例子:k比特推论法</s> (The case for 4-bit precision: k-bit Inference Scaling Laws) - 专知论文

会员服务 ·

0

查准率/准确率 · 缩放 · MoDELS · 推断 · 模型评估 ·

2023 年 2 月 28 日

The case for 4-bit precision: k-bit Inference Scaling Laws

翻译：4比特精确度的例子:k比特推论法

Tim Dettmers,Luke Zettlemoyer

Quantization methods reduce the number of bits required to represent each parameter in a model, trading accuracy for smaller memory footprints and inference latencies. However, the final model size depends on both the number of parameters of the original model and the rate of compression. For example, a 30B 8-bit model and a 60B 4-bit model have the same number of bits but may have very different zero-shot accuracies. In this work, we study this trade-off by developing inference scaling laws of zero-shot performance in Large Language Models (LLMs) to determine the bit-precision and model size that maximizes zero-shot performance. We run more than 35,000 experiments with 16-bit inputs and k-bit parameters to examine which zero-shot quantization methods improve scaling for 3 to 8-bit precision at scales of 19M to 176B parameters across the LLM families BLOOM, OPT, NeoX/Pythia, and GPT-2. We find that it is challenging to improve the bit-level scaling trade-off, with the only improvements being the use of a small block size -- splitting the parameters into small independently quantized blocks -- and the quantization data type being used (e.g., Int vs Float). Overall, our findings show that {4-bit} precision is almost universally optimal for total model bits and zero-shot accuracy.

翻译：量化方法减少了在模型中代表每个参数所需的比特数数量,将精确度转换成较小的内存足迹和推推误延迟。然而,最后模型大小取决于原始模型参数的数量和压缩率。例如,30B 8比特模型和60B 4比特模型的比特数和60B 4比特模型的比特数数量相同,但可能有非常不同的零点偏差。在这项工作中,我们通过在大语言模型中制定零弹性能的推论比量定值来研究这一权衡,以确定使零弹性能最大化的比特精度和模型大小。我们用16位投入和k比特参数进行超过35 000次的实验,以检查哪些零弹四分法方法在19M至176比特的LLLM家族B、ALOM、ALM、NEOX/Pythia和GPT-2中改进了3到8位精确度的比特参数。我们发现,改进大语言模型的比特级贸易比值的比值和模型具有挑战性,只有改进的比特级,只有使用小块的精确度的比特,几乎可以独立地将数据转换成区块。</s>

0

相关内容

查准率/准确率

查准率/准确率

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

【ICML2020】序数非负矩阵分解推荐，On the Number of Linear Regions of Convolutional Neural Networks

【ICML2020】序数非负矩阵分解推荐，On the Number of Linear Regions of Convolutional Neural Networks

专知会员服务

17+阅读 · 2020年6月4日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

GPU程序访存行为分析和优化关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于"Build-and-Click"法的铂类RNA聚合酶I选择性抑制剂的构建、评价及亚细胞定位研究

国家自然科学基金

1+阅读 · 2013年12月31日

面向云服务数据中心的OpenScale全光交换网络

国家自然科学基金

3+阅读 · 2013年12月31日

通用异构并行密度泛函计算方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

高平均束流功率辐照加速器加速结构的研制

国家自然科学基金

0+阅读 · 2012年12月31日

用于锂离子电池的新型离子液体电解液的设计与制备基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Preisach算子的动力电池开路电压滞回效应建模及其多时间尺度在线估计

国家自然科学基金

0+阅读 · 2012年12月31日

中性粒细胞致胆囊Cajal样间质细胞损伤的形态与功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

PCOS患者卵巢颗粒细胞对卵子及早期胚胎发育潜能的基因调控

国家自然科学基金

0+阅读 · 2011年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

27+阅读 · 2021年6月16日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

Arxiv

14+阅读 · 2020年12月22日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Knowledge-based Fully Convolutional Network and Its Application in Segmentation of Lung CT Images

Arxiv

17+阅读 · 2018年5月22日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

VIP会员

文章信息

相关主题

查准率/准确率

相关VIP内容

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

一份简单《图神经网络》教程，28页ppt

一份简单《图神经网络》教程，28页ppt

专知会员服务

126+阅读 · 2020年8月2日

【ICML2020】序数非负矩阵分解推荐，On the Number of Linear Regions of Convolutional Neural Networks

【ICML2020】序数非负矩阵分解推荐，On the Number of Linear Regions of Convolutional Neural Networks

专知会员服务

17+阅读 · 2020年6月4日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

卫星导航技术发展综述

《美军"僚机"联合能力技术演示项目：有人-无人火炮作战》41页报告

美军条令《火力指挥》116页

可解释的人工智能在生物医学图像分析中的应用综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

On the Opportunities and Risks of Foundation Models

Arxiv

30+阅读 · 2021年8月18日

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Arxiv

27+阅读 · 2021年6月16日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks

Arxiv

14+阅读 · 2021年1月31日

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

A Hierarchical Reasoning Graph Neural Network for The Automatic Scoring of Answer Transcriptions in Video Job Interviews

Arxiv

14+阅读 · 2020年12月22日

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Arxiv

12+阅读 · 2020年6月23日

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Extreme Language Model Compression with Optimal Subwords and Shared Projections

Arxiv

18+阅读 · 2019年9月25日

Knowledge-based Fully Convolutional Network and Its Application in Segmentation of Lung CT Images

Arxiv

17+阅读 · 2018年5月22日

End-to-End Multi-Task Learning with Attention

Arxiv

19+阅读 · 2018年3月28日

相关基金

GPU程序访存行为分析和优化关键技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于"Build-and-Click"法的铂类RNA聚合酶I选择性抑制剂的构建、评价及亚细胞定位研究

国家自然科学基金

1+阅读 · 2013年12月31日

面向云服务数据中心的OpenScale全光交换网络

国家自然科学基金

3+阅读 · 2013年12月31日

通用异构并行密度泛函计算方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

高平均束流功率辐照加速器加速结构的研制

国家自然科学基金

0+阅读 · 2012年12月31日

用于锂离子电池的新型离子液体电解液的设计与制备基础研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Preisach算子的动力电池开路电压滞回效应建模及其多时间尺度在线估计

国家自然科学基金

0+阅读 · 2012年12月31日

中性粒细胞致胆囊Cajal样间质细胞损伤的形态与功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

PCOS患者卵巢颗粒细胞对卵子及早期胚胎发育潜能的基因调控

国家自然科学基金

0+阅读 · 2011年12月31日

遍历哈密顿系统的谱理论

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员