NoisyQuant: 在视觉Transformer中加噪偏增强的训练后激活量化 (NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers) - 专知论文

会员服务 ·

0

量化器 · 视觉Transformer · Transformer · 偏差 · 噪声 ·

2023 年 4 月 19 日

NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers

翻译：NoisyQuant: 在视觉Transformer中加噪偏增强的训练后激活量化

Yijiang Liu,Huanrui Yang,Zhen Dong,Kurt Keutzer,Li Du,Shanghang Zhang

from arxiv, Accepted to CVPR2023

The complicated architecture and high training cost of vision transformers urge the exploration of post-training quantization. However, the heavy-tailed distribution of vision transformer activations hinders the effectiveness of previous post-training quantization methods, even with advanced quantizer designs. Instead of tuning the quantizer to better fit the complicated activation distribution, this paper proposes NoisyQuant, a quantizer-agnostic enhancement for the post-training activation quantization performance of vision transformers. We make a surprising theoretical discovery that for a given quantizer, adding a fixed Uniform noisy bias to the values being quantized can significantly reduce the quantization error under provable conditions. Building on the theoretical insight, NoisyQuant achieves the first success on actively altering the heavy-tailed activation distribution with additive noisy bias to fit a given quantizer. Extensive experiments show NoisyQuant largely improves the post-training quantization performance of vision transformer with minimal computation overhead. For instance, on linear uniform 6-bit activation quantization, NoisyQuant improves SOTA top-1 accuracy on ImageNet by up to 1.7%, 1.1% and 0.5% for ViT, DeiT, and Swin Transformer respectively, achieving on-par or even higher performance than previous nonlinear, mixed-precision quantization.

翻译：本文提出了NoisyQuant，这是一种量化器不可知的后训练活化量化性能增强方法，专为解决视觉Transformer结构复杂、训练成本高的问题而设计。然而，视觉Transformer激活分布具有重尾特征，这阻碍了以前的后训练量化方法的有效性，即使采用了先进的量化器设计。与通过调整量化器以更好地适应复杂激活分布的方法不同，本文提出了NoisyQuant。基于理论发现，对于给定的量化器，向被量化的值添加固定均匀随机噪声偏差可以在可证明的条件下显着降低量化误差。借助这个理论认识，NoisyQuant取得了首次成功，使用加性噪声偏差积极改变重尾激活分布以适应给定的量化器。大量实验证明，NoisyQuant可以在最小的计算开销下大大提高视觉Transformer的后训练量化性能。例如，在线性均匀6位激活量化中，NoisyQuant提高了ViT、DeiT和Swin Transformer在ImageNet上的SOTA top-1精度，分别为1.7%、1.1%和0.5%，实现了与以前的非线性、混合精度量化相当甚至更高的性能。

0

相关内容

量化器

【AAAI2023】FacT:视觉Transformer上轻量级自适应的因子精调

【AAAI2023】FacT:视觉Transformer上轻量级自适应的因子精调

专知会员服务

17+阅读 · 2022年12月8日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

【ICML2020投稿论文】用于半监督图像分类的CowMask，Milking CowMask for Semi-Supervised Image Classification

【ICML2020投稿论文】用于半监督图像分类的CowMask，Milking CowMask for Semi-Supervised Image Classification

专知会员服务

29+阅读 · 2020年3月27日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【论文】自训练噪声student模型提高ImageNet分类准确率（Self-training with Noisy Student improves ImageNet classification），谷歌研究科学家Quoc V. Le等

【论文】自训练噪声student模型提高ImageNet分类准确率（Self-training with Noisy Student improves ImageNet classification），谷歌研究科学家Quoc V. Le等

专知会员服务

24+阅读 · 2019年11月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

ECCV 2022 | 无需下游训练，Tip-Adapter大幅提升CLIP图像分类准确率

ECCV 2022 | 无需下游训练，Tip-Adapter大幅提升CLIP图像分类准确率

机器之心

4+阅读 · 2022年9月25日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

一般误差分布下若干半参数模型的复合分位数方法

国家自然科学基金

0+阅读 · 2014年12月31日

两类迁移扩散方程组的若干问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

七叶皂苷钠调控NOX家族蛋白介导的MAPK、PI3K/AKT信号通路保护神经细胞氧化应激损伤

国家自然科学基金

0+阅读 · 2013年12月31日

统计学习理论中的分位数回归和MEE算法

国家自然科学基金

1+阅读 · 2012年12月31日

PCAF乙酰化修饰XBP1s蛋白对糖尿病小鼠血糖稳态的调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

空间遥感绝对辐射定标基准辐射计

国家自然科学基金

0+阅读 · 2012年12月31日

高速公路隧道出入口段驾驶员视觉明暗适应变化规律研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于EPI技术的磁共振对比增强全心冠状动脉成像研究

国家自然科学基金

0+阅读 · 2009年12月31日

压缩采样框架下的自适应稀疏信号感知与重建

国家自然科学基金

0+阅读 · 2009年12月31日

l1范数约束下的自适应滤波算法及其在稀疏系统辨识中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Arxiv

0+阅读 · 2023年6月6日

Learning Activation Functions for Sparse Neural Networks

Arxiv

1+阅读 · 2023年6月5日

Proteus: Simulating the Performance of Distributed DNN Training

Arxiv

0+阅读 · 2023年6月4日

Active Code Learning: Benchmarking Sample-Efficient Training of Code Models

Arxiv

0+阅读 · 2023年6月2日

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Arxiv

0+阅读 · 2023年6月1日

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Arxiv

0+阅读 · 2023年6月1日

Boosting the Performance of Transformer Architectures for Semantic Textual Similarity

Arxiv

0+阅读 · 2023年6月1日

A Comprehensive Survey on Distributed Training of Graph Neural Networks

Arxiv

14+阅读 · 2022年11月11日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

VIP会员

文章信息

相关主题

视觉Transformer

相关VIP内容

【AAAI2023】FacT:视觉Transformer上轻量级自适应的因子精调

【AAAI2023】FacT:视觉Transformer上轻量级自适应的因子精调

专知会员服务

17+阅读 · 2022年12月8日

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

【CMU博士论文】用动态超参数优化改进深度学习训练和推理，Improving Deep Learning Training and Inference with Dynamic Hyperparameter Optimization

专知会员服务

55+阅读 · 2020年5月26日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

【ICML2020投稿论文】用于半监督图像分类的CowMask，Milking CowMask for Semi-Supervised Image Classification

【ICML2020投稿论文】用于半监督图像分类的CowMask，Milking CowMask for Semi-Supervised Image Classification

专知会员服务

29+阅读 · 2020年3月27日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

【预训练论文】预训练Transformer校准，Calibration of Pre-trained Transformers

专知会员服务

26+阅读 · 2020年3月19日

Transformer文本分类代码

Transformer文本分类代码

专知会员服务

118+阅读 · 2020年2月3日

【论文】自训练噪声student模型提高ImageNet分类准确率（Self-training with Noisy Student improves ImageNet classification），谷歌研究科学家Quoc V. Le等

【论文】自训练噪声student模型提高ImageNet分类准确率（Self-training with Noisy Student improves ImageNet classification），谷歌研究科学家Quoc V. Le等

专知会员服务

24+阅读 · 2019年11月20日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

ECCV 2022 | 无需下游训练，Tip-Adapter大幅提升CLIP图像分类准确率

ECCV 2022 | 无需下游训练，Tip-Adapter大幅提升CLIP图像分类准确率

机器之心

4+阅读 · 2022年9月25日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

Arxiv

0+阅读 · 2023年6月6日

Learning Activation Functions for Sparse Neural Networks

Arxiv

1+阅读 · 2023年6月5日

Proteus: Simulating the Performance of Distributed DNN Training

Arxiv

0+阅读 · 2023年6月4日

Active Code Learning: Benchmarking Sample-Efficient Training of Code Models

Arxiv

0+阅读 · 2023年6月2日

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Arxiv

0+阅读 · 2023年6月1日

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Arxiv

0+阅读 · 2023年6月1日

Boosting the Performance of Transformer Architectures for Semantic Textual Similarity

Arxiv

0+阅读 · 2023年6月1日

A Comprehensive Survey on Distributed Training of Graph Neural Networks

Arxiv

14+阅读 · 2022年11月11日

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Arxiv

12+阅读 · 2021年8月30日

XLNet: Generalized Autoregressive Pretraining for Language Understanding

Arxiv

14+阅读 · 2019年6月19日

相关基金

一般误差分布下若干半参数模型的复合分位数方法

国家自然科学基金

0+阅读 · 2014年12月31日

两类迁移扩散方程组的若干问题研究

国家自然科学基金

0+阅读 · 2013年12月31日

七叶皂苷钠调控NOX家族蛋白介导的MAPK、PI3K/AKT信号通路保护神经细胞氧化应激损伤

国家自然科学基金

0+阅读 · 2013年12月31日

统计学习理论中的分位数回归和MEE算法

国家自然科学基金

1+阅读 · 2012年12月31日

PCAF乙酰化修饰XBP1s蛋白对糖尿病小鼠血糖稳态的调控研究

国家自然科学基金

0+阅读 · 2012年12月31日

空间遥感绝对辐射定标基准辐射计

国家自然科学基金

0+阅读 · 2012年12月31日

高速公路隧道出入口段驾驶员视觉明暗适应变化规律研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于EPI技术的磁共振对比增强全心冠状动脉成像研究

国家自然科学基金

0+阅读 · 2009年12月31日

压缩采样框架下的自适应稀疏信号感知与重建

国家自然科学基金

0+阅读 · 2009年12月31日

l1范数约束下的自适应滤波算法及其在稀疏系统辨识中的应用

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员