重新排序基于后训练量化的大语言模型(RPTQ) (RPTQ: Reorder-based Post-training Quantization for Large Language Models) - 专知论文

会员服务 ·

0

排序 · 通道 · 语言模型 · 定量 · 排列 ·

2023 年 4 月 3 日

RPTQ: Reorder-based Post-training Quantization for Large Language Models

翻译：重新排序基于后训练量化的大语言模型(RPTQ)

Zhihang Yuan,Lin Niu,Jiawei Liu,Wenyu Liu,Xinggang Wang,Yuzhang Shang,Guangyu Sun,Qiang Wu,Jiaxiang Wu,Bingzhe Wu

from arxiv, 17 pages

Large-scale language models (LLMs) have demonstrated outstanding performance on various tasks, but their deployment poses challenges due to their enormous model size. In this paper, we identify that the main challenge in quantizing LLMs stems from the different activation ranges between the channels, rather than just the issue of outliers.We propose a novel reorder-based quantization approach, RPTQ, that addresses the issue of quantizing the activations of LLMs. RPTQ rearranges the channels in the activations and then quantizing them in clusters, thereby reducing the impact of range difference of channels. In addition, we reduce the storage and computation overhead by avoiding explicit reordering. By implementing this approach, we achieved a significant breakthrough by pushing LLM models to 3 bit activation for the first time.

翻译：大规模语言模型(LLMs)已经在各种任务中展示了出色的性能，但由于其巨大的模型大小，它们的部署存在挑战。在本文中，我们确定量化LLMs的主要挑战来自于通道之间的不同激活范围，而不仅仅是异常值的问题。我们提出了一种新颖的基于重新排序的量化方法RPTQ，解决了量化LLMs激活的问题。RPTQ重新排列激活中的通道，然后将它们分组量化，从而减少通道范围差异的影响。此外，通过避免显式重新排序，我们减少了存储和计算开销。通过实现这一方法，我们第一次将LLM模型推向了3位激活，取得了显著的突破。

0

相关内容

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

130亿参数，8个A100训练，UC伯克利发布对话模型Koala

130亿参数，8个A100训练，UC伯克利发布对话模型Koala

专知会员服务

44+阅读 · 2023年4月5日

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

基于重排序的新量化方法RPTQ：实现大型语言模型的 3 比特量化

基于重排序的新量化方法RPTQ：实现大型语言模型的 3 比特量化

机器之心

3+阅读 · 2023年4月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TensorFlow官方发布剪枝优化工具：参数减少80%，精度几乎不变

TensorFlow官方发布剪枝优化工具：参数减少80%，精度几乎不变

量子位

11+阅读 · 2019年5月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新八篇推荐系统相关论文—可解释推荐、上下文感知推荐系统、异构知识库嵌入、深度强化学习、移动推荐系统

【论文推荐】最新八篇推荐系统相关论文—可解释推荐、上下文感知推荐系统、异构知识库嵌入、深度强化学习、移动推荐系统

专知

17+阅读 · 2018年6月16日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

基于Amalgam空间的Hardy空间实变理论及其应用

国家自然科学基金

1+阅读 · 2017年12月31日

Choquet期望下极限定理及其收敛速度的刻画

国家自然科学基金

0+阅读 · 2015年12月31日

移动融合网中随机拓扑缩放律研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于空间博弈的数据中心网络带宽分配与流量最优化调度算法研究

国家自然科学基金

2+阅读 · 2014年12月31日

非高斯过程驱动系统的随机不变流形

国家自然科学基金

0+阅读 · 2013年12月31日

融合异构信息的低秩分解推荐模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

水及水溶液微观结构的拉曼光谱研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于空间位置编码的时间知觉的研究

国家自然科学基金

0+阅读 · 2011年12月31日

考虑微结构随机性的三维高阶MRCT多尺度计算理论研究

国家自然科学基金

0+阅读 · 2011年12月31日

取向非晶聚乳酸等温冷结晶过程中微结构演化

国家自然科学基金

0+阅读 · 2008年12月31日

Exploring Chain-of-Thought Style Prompting for Text-to-SQL

Arxiv

0+阅读 · 2023年5月23日

Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization

Arxiv

0+阅读 · 2023年5月23日

Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model

Arxiv

0+阅读 · 2023年5月23日

Enhance Reasoning Ability of Visual-Language Models via Large Language Models

Arxiv

0+阅读 · 2023年5月22日

SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities

Arxiv

1+阅读 · 2023年5月19日

LLM-Pruner: On the Structural Pruning of Large Language Models

Arxiv

0+阅读 · 2023年5月19日

Post Hoc Explanations of Language Models Can Improve Language Models

Arxiv

0+阅读 · 2023年5月19日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

VIP会员

文章信息

相关主题

相关VIP内容

130亿参数，8个A100训练，UC伯克利发布对话模型Koala

130亿参数，8个A100训练，UC伯克利发布对话模型Koala

专知会员服务

44+阅读 · 2023年4月5日

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

53+阅读 · 2020年9月7日

【Google】平滑对抗训练，Smooth Adversarial Training

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

基于重排序的新量化方法RPTQ：实现大型语言模型的 3 比特量化

基于重排序的新量化方法RPTQ：实现大型语言模型的 3 比特量化

机器之心

3+阅读 · 2023年4月9日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

TensorFlow官方发布剪枝优化工具：参数减少80%，精度几乎不变

TensorFlow官方发布剪枝优化工具：参数减少80%，精度几乎不变

量子位

11+阅读 · 2019年5月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新八篇推荐系统相关论文—可解释推荐、上下文感知推荐系统、异构知识库嵌入、深度强化学习、移动推荐系统

【论文推荐】最新八篇推荐系统相关论文—可解释推荐、上下文感知推荐系统、异构知识库嵌入、深度强化学习、移动推荐系统

专知

17+阅读 · 2018年6月16日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

相关论文

Exploring Chain-of-Thought Style Prompting for Text-to-SQL

Arxiv

0+阅读 · 2023年5月23日

Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization

Arxiv

0+阅读 · 2023年5月23日

Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model

Arxiv

0+阅读 · 2023年5月23日

Enhance Reasoning Ability of Visual-Language Models via Large Language Models

Arxiv

0+阅读 · 2023年5月22日

SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities

Arxiv

1+阅读 · 2023年5月19日

LLM-Pruner: On the Structural Pruning of Large Language Models

Arxiv

0+阅读 · 2023年5月19日

Post Hoc Explanations of Language Models Can Improve Language Models

Arxiv

0+阅读 · 2023年5月19日

Towards Reasoning in Large Language Models: A Survey

Arxiv

34+阅读 · 2022年12月20日

A Comprehensive Survey on Transfer Learning

A Comprehensive Survey on Transfer Learning

Arxiv

121+阅读 · 2019年11月7日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

相关基金

基于Amalgam空间的Hardy空间实变理论及其应用

国家自然科学基金

1+阅读 · 2017年12月31日

Choquet期望下极限定理及其收敛速度的刻画

国家自然科学基金

0+阅读 · 2015年12月31日

移动融合网中随机拓扑缩放律研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于空间博弈的数据中心网络带宽分配与流量最优化调度算法研究

国家自然科学基金

2+阅读 · 2014年12月31日

非高斯过程驱动系统的随机不变流形

国家自然科学基金

0+阅读 · 2013年12月31日

融合异构信息的低秩分解推荐模型研究

国家自然科学基金

1+阅读 · 2013年12月31日

水及水溶液微观结构的拉曼光谱研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于空间位置编码的时间知觉的研究

国家自然科学基金

0+阅读 · 2011年12月31日

考虑微结构随机性的三维高阶MRCT多尺度计算理论研究

国家自然科学基金

0+阅读 · 2011年12月31日

取向非晶聚乳酸等温冷结晶过程中微结构演化

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员