夸大图: GPT-2 量化的适应器 (Quadapter: Adapter for GPT-2 Quantization) - 专知论文

会员服务 ·

0

GPT-2 · MoDELS · 过拟合 · ForCES · 异常点 ·

2022 年 11 月 30 日

Quadapter: Adapter for GPT-2 Quantization

翻译：夸大图: GPT-2 量化的适应器

Minseop Park,Jaeseong You,Markus Nagel,Simyung Chang

Transformer language models such as GPT-2 are difficult to quantize because of outliers in activations leading to a large quantization error. To adapt to the error, one must use quantization-aware training, which entails a fine-tuning process based on the dataset and the training pipeline identical to those for the original model. Pretrained language models, however, often do not grant access to their datasets and training pipelines, forcing us to rely on arbitrary ones for fine-tuning. In that case, it is observed that quantization-aware training overfits the model to the fine-tuning data. For quantization without overfitting, we introduce a quantization adapter (Quadapter), a small set of parameters that are learned to make activations quantization-friendly by scaling them channel-wise. It keeps the model parameters unchanged. By applying our method to the challenging task of quantizing GPT-2, we demonstrate that it effectively prevents the overfitting and improves the quantization performance.

翻译：GPT-2等变换语言模型很难量化,因为启动时的外差导致大量化误差。要适应错误, 就必须使用量子识别培训, 其中包括基于数据集和与原始模型相同的培训管道的微调过程。但是, 预先培训的语言模型往往不准许访问其数据集和培训管道, 迫使我们依赖任意的数据集和培训管道进行微调。在这种情况下, 人们发现, 量子识别培训使模型比微调数据更适合该模型。为了不作过度调整, 我们引入了量化调整适应器( Quadapter), 这是一套小的参数, 用来通过扩缩通道使启动量子化变得方便。它使模型参数保持不变。通过应用我们的方法来完成量化 GPT-2 的艰巨任务, 我们证明它有效防止了夸大和改良性表现的过度。

0

相关内容

GPT-2

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【Nils Reimers】神经搜索的无监督域自适应，Unsupervised domain adaptation for neural search

【Nils Reimers】神经搜索的无监督域自适应，Unsupervised domain adaptation for neural search

专知会员服务

10+阅读 · 2022年3月8日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

hTERT基因多态性影响动脉粥样硬化形成机制及民族差异性研究

国家自然科学基金

0+阅读 · 2014年12月31日

FBXO6与糖基化Ero1L相互作用在内质网应激中的分子机制及其生物学意义

国家自然科学基金

0+阅读 · 2013年12月31日

遗忘型MCI转归为阿尔茨海默病过程中BDNF基因的表观遗传学机制

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA NeST调控非节段型白癜风CD8+CTLs细胞IFN-γ通路活化的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

15-kDa硒蛋白在内质网应激（ERS）和阿尔茨海默病(AD)中的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

稀土掺杂对Co基Heusler合金磁性和费米能级的调控

国家自然科学基金

0+阅读 · 2011年12月31日

IER5基因调节宫颈癌放疗敏感性的功能及其作用机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

细胞周期蛋白Nlp在维持基因组稳定和肿瘤发生中的作用和分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

IER2在肝癌转移中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Structured Pruning Adapters

Arxiv

0+阅读 · 2023年2月2日

An Empirical Study on the Transferability of Transformer Modules in Parameter-Efficient Fine-Tuning

Arxiv

0+阅读 · 2023年2月1日

AMD: Adaptive Masked Distillation for Object

Arxiv

0+阅读 · 2023年1月31日

Identifiability and inference for copula-based semiparametric models for random vectors with arbitrary marginal distributions

Arxiv

0+阅读 · 2023年1月31日

ERA-Solver: Error-Robust Adams Solver for Fast Sampling of Diffusion Probabilistic Models

Arxiv

0+阅读 · 2023年1月31日

Differentiable Entailment for Parameter Efficient Few Shot Learning

Arxiv

0+阅读 · 2023年1月31日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

VIP会员

文章信息

相关主题

相关VIP内容

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

【Nils Reimers】神经搜索的无监督域自适应，Unsupervised domain adaptation for neural search

【Nils Reimers】神经搜索的无监督域自适应，Unsupervised domain adaptation for neural search

专知会员服务

10+阅读 · 2022年3月8日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《人工智能技术提升军事不确定性环境下领导决策能力研究》180页

以机器速度锁定目标：人工智能的能力与局限

中文版 | 革新国家安全：国防情报离线本地部署大语言模型

《美军21世纪医疗抵消战略》

相关资讯

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

相关论文

Structured Pruning Adapters

Arxiv

0+阅读 · 2023年2月2日

An Empirical Study on the Transferability of Transformer Modules in Parameter-Efficient Fine-Tuning

Arxiv

0+阅读 · 2023年2月1日

AMD: Adaptive Masked Distillation for Object

Arxiv

0+阅读 · 2023年1月31日

Identifiability and inference for copula-based semiparametric models for random vectors with arbitrary marginal distributions

Arxiv

0+阅读 · 2023年1月31日

ERA-Solver: Error-Robust Adams Solver for Fast Sampling of Diffusion Probabilistic Models

Arxiv

0+阅读 · 2023年1月31日

Differentiable Entailment for Parameter Efficient Few Shot Learning

Arxiv

0+阅读 · 2023年1月31日

Active Learning for Domain Adaptation: An Energy-based Approach

Arxiv

13+阅读 · 2021年12月2日

Adaptive Transfer Learning on Graph Neural Networks

Arxiv

14+阅读 · 2021年7月20日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

相关基金

罗巴代数的表示和罗巴代数在operad中的应用

国家自然科学基金

0+阅读 · 2015年12月31日

hTERT基因多态性影响动脉粥样硬化形成机制及民族差异性研究

国家自然科学基金

0+阅读 · 2014年12月31日

FBXO6与糖基化Ero1L相互作用在内质网应激中的分子机制及其生物学意义

国家自然科学基金

0+阅读 · 2013年12月31日

遗忘型MCI转归为阿尔茨海默病过程中BDNF基因的表观遗传学机制

国家自然科学基金

0+阅读 · 2013年12月31日

长链非编码RNA NeST调控非节段型白癜风CD8+CTLs细胞IFN-γ通路活化的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

15-kDa硒蛋白在内质网应激（ERS）和阿尔茨海默病(AD)中的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

稀土掺杂对Co基Heusler合金磁性和费米能级的调控

国家自然科学基金

0+阅读 · 2011年12月31日

IER5基因调节宫颈癌放疗敏感性的功能及其作用机理研究

国家自然科学基金

0+阅读 · 2011年12月31日

细胞周期蛋白Nlp在维持基因组稳定和肿瘤发生中的作用和分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

IER2在肝癌转移中的作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员