普鲁宁 Ternary 量化 (Pruning Ternary Quantization) - 专知论文

会员服务 ·

0

剪枝 · Weight · 模型评估 · 可约的 · 标准正交 ·

2021 年 7 月 23 日

Pruning Ternary Quantization

翻译：普鲁宁 Ternary 量化

Dan Liu,Xi Chen,Jie Fu,Xue Liu

We propose pruning ternary quantization (PTQ), a simple, yet effective, symmetric ternary quantization method. The method significantly compresses neural network weights to a sparse ternary of [-1,0,1] and thus reduces computational, storage, and memory footprints. We show that PTQ can convert regular weights to ternary orthonormal bases by simply using pruning and L2 projection. In addition, we introduce a refined straight-through estimator to finalize and stabilize the quantized weights. Our method can provide at most 46x compression ratio on the ResNet-18 structure, with an acceptable accuracy of 65.36%, outperforming leading methods. Furthermore, PTQ can compress a ResNet-18 model from 46 MB to 955KB (~48x) and a ResNet-50 model from 99 MB to 3.3MB (~30x), while the top-1 accuracy on ImageNet drops slightly from 69.7% to 65.3% and from 76.15% to 74.47%, respectively. Our method unifies pruning and quantization and thus provides a range of size-accuracy trade-off.

翻译：我们建议使用一种简单而有效、对称的对称永久量化法(PTQ ) 。这种方法将神经网络重量大幅压缩到一个稀疏的[1,0,1],从而减少计算、储存和内存足迹。我们显示, PTQ 将正常重量转换到永久正正态基数, 只需使用修剪和L2 投影即可。此外, 我们引入了一个精细的直通估测器, 以最终确定和稳定定量加权数。我们的方法可以提供ResNet-18 结构中最多46x压缩率, 可接受的精确度为65.36%, 优于领先方法。此外, PTQ 可以将ResNet-18 模型从46 MB 压缩到 955KB (~48x), ResNet-50 模型从99 MB 到3.3MB (~30x), 而图像网顶部的精度则从69.7% 下降到65.3%, 从76. 15%下降到74.47% 47% 。我们的方法是统一的, 并且提供了范围范围。

0

相关内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT-MLSys2020】神经网络剪枝的研究进展状态，Neural Network Pruning

【MIT-MLSys2020】神经网络剪枝的研究进展状态，Neural Network Pruning

专知会员服务

29+阅读 · 2020年3月10日

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

专知会员服务

30+阅读 · 2019年11月22日

【ICCV 2019】基于元学习的自动化神经网络通道 MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

【ICCV 2019】基于元学习的自动化神经网络通道 MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

专知会员服务

17+阅读 · 2019年11月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

深度神经网络模型压缩与加速综述

深度神经网络模型压缩与加速综述

专知会员服务

129+阅读 · 2019年10月12日

BERT 瘦身之路：Distillation，Quantization，Pruning

BERT 瘦身之路：Distillation，Quantization，Pruning

AINLP

10+阅读 · 2019年10月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

已删除

将门创投

12+阅读 · 2018年6月25日

PRL导读-2018年120卷15期

PRL导读-2018年120卷15期

中科院物理所

4+阅读 · 2018年4月23日

从R-CNN到Mask R-CNN！

从R-CNN到Mask R-CNN！

全球人工智能

17+阅读 · 2017年11月13日

从R-CNN到Mask R-CNN

从R-CNN到Mask R-CNN

机器学习研究会

25+阅读 · 2017年11月13日

CNN模型压缩与加速算法综述

CNN模型压缩与加速算法综述

微信AI

6+阅读 · 2017年10月11日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

Learned Token Pruning for Transformers

Arxiv

0+阅读 · 2021年9月23日

A Gradient Flow Framework For Analyzing Network Pruning

Arxiv

0+阅读 · 2021年9月23日

Graph Pruning for Model Compression

Graph Pruning for Model Compression

Arxiv

1+阅读 · 2021年9月23日

Parameter Efficient Multimodal Transformers for Video Representation Learning

Parameter Efficient Multimodal Transformers for Video Representation Learning

Arxiv

0+阅读 · 2021年9月22日

Effective Model Compression via Stage-wise Pruning

Arxiv

0+阅读 · 2021年9月22日

High-dimensional Bayesian Optimization for CNN Auto Pruning with Clustering and Rollback

Arxiv

0+阅读 · 2021年9月22日

Zero-shot Adversarial Quantization

Arxiv

6+阅读 · 2021年3月30日

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Arxiv

3+阅读 · 2019年9月12日

Quantization Mimic: Towards Very Tiny CNN for Object Detection

Quantization Mimic: Towards Very Tiny CNN for Object Detection

Arxiv

5+阅读 · 2018年9月13日

Arxiv

7+阅读 · 2018年1月24日

VIP会员

文章信息

相关主题

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【MIT-MLSys2020】神经网络剪枝的研究进展状态，Neural Network Pruning

【MIT-MLSys2020】神经网络剪枝的研究进展状态，Neural Network Pruning

专知会员服务

29+阅读 · 2020年3月10日

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

【论文推荐】基于BERT修剪的问答模型（Pruning a BERT-based Question Answering Model）

专知会员服务

30+阅读 · 2019年11月22日

【ICCV 2019】基于元学习的自动化神经网络通道 MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

【ICCV 2019】基于元学习的自动化神经网络通道 MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning

专知会员服务

17+阅读 · 2019年11月17日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

深度神经网络模型压缩与加速综述

深度神经网络模型压缩与加速综述

专知会员服务

129+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

BERT 瘦身之路：Distillation，Quantization，Pruning

BERT 瘦身之路：Distillation，Quantization，Pruning

AINLP

10+阅读 · 2019年10月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Deep Compression/Acceleration：模型压缩加速论文汇总

Deep Compression/Acceleration：模型压缩加速论文汇总

极市平台

14+阅读 · 2019年5月15日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

已删除

将门创投

12+阅读 · 2018年6月25日

PRL导读-2018年120卷15期

PRL导读-2018年120卷15期

中科院物理所

4+阅读 · 2018年4月23日

从R-CNN到Mask R-CNN！

从R-CNN到Mask R-CNN！

全球人工智能

17+阅读 · 2017年11月13日

从R-CNN到Mask R-CNN

从R-CNN到Mask R-CNN

机器学习研究会

25+阅读 · 2017年11月13日

CNN模型压缩与加速算法综述

CNN模型压缩与加速算法综述

微信AI

6+阅读 · 2017年10月11日

强化学习 cartpole_a3c

强化学习 cartpole_a3c

CreateAMind

9+阅读 · 2017年7月21日

相关论文

Learned Token Pruning for Transformers

Arxiv

0+阅读 · 2021年9月23日

A Gradient Flow Framework For Analyzing Network Pruning

Arxiv

0+阅读 · 2021年9月23日

Graph Pruning for Model Compression

Graph Pruning for Model Compression

Arxiv

1+阅读 · 2021年9月23日

Parameter Efficient Multimodal Transformers for Video Representation Learning

Parameter Efficient Multimodal Transformers for Video Representation Learning

Arxiv

0+阅读 · 2021年9月22日

Effective Model Compression via Stage-wise Pruning

Arxiv

0+阅读 · 2021年9月22日

High-dimensional Bayesian Optimization for CNN Auto Pruning with Clustering and Rollback

Arxiv

0+阅读 · 2021年9月22日

Zero-shot Adversarial Quantization

Arxiv

6+阅读 · 2021年3月30日

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Arxiv

3+阅读 · 2019年9月12日

Quantization Mimic: Towards Very Tiny CNN for Object Detection

Quantization Mimic: Towards Very Tiny CNN for Object Detection

Arxiv

5+阅读 · 2018年9月13日

Arxiv

7+阅读 · 2018年1月24日

微信扫码咨询专知VIP会员