F8Net: 网络量化的固定点8位数乘数 (F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization) - 专知论文

会员服务 ·

0

可约的 · 不动点运算 · 统计量 · Networking · Performance ·

2022 年 2 月 10 日

F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

翻译：F8Net: 网络量化的固定点8位数乘数

Qing Jin,Jian Ren,Richard Zhuang,Sumant Hanumante,Zhengang Li,Zhiyu Chen,Yanzhi Wang,Kaiyuan Yang,Sergey Tulyakov

from arxiv, ICLR 2022 (Oral)

Neural network quantization is a promising compression technique to reduce memory footprint and save energy consumption, potentially leading to real-time inference. However, there is a performance gap between quantized and full-precision models. To reduce it, existing quantization approaches require high-precision INT32 or full-precision multiplication during inference for scaling or dequantization. This introduces a noticeable cost in terms of memory, speed, and required energy. To tackle these issues, we present F8Net, a novel quantization framework consisting of only fixed-point 8-bit multiplication. To derive our method, we first discuss the advantages of fixed-point multiplication with different formats of fixed-point numbers and study the statistical behavior of the associated fixed-point numbers. Second, based on the statistical and algorithmic analysis, we apply different fixed-point formats for weights and activations of different layers. We introduce a novel algorithm to automatically determine the right format for each layer during training. Third, we analyze a previous quantization algorithm -- parameterized clipping activation (PACT) -- and reformulate it using fixed-point arithmetic. Finally, we unify the recently proposed method for quantization fine-tuning and our fixed-point approach to show the potential of our method. We verify F8Net on ImageNet for MobileNet V1/V2 and ResNet18/50. Our approach achieves comparable and better performance, when compared not only to existing quantization techniques with INT32 multiplication or floating-point arithmetic, but also to the full-precision counterparts, achieving state-of-the-art performance.

翻译：神经网络量度是一种有希望的压缩技术,可以减少记忆足迹和节省能源消耗,有可能导致实时推断。然而,量化和全面精确模型之间存在性能差距。为了减少这种差距,现有的量化方法需要高精度 INT32 或推算过程中的完全精度乘法。这在记忆、速度和所需能量方面带来了显著的成本。为了解决这些问题,我们提出了F8Net,这是一个新的量化框架,仅包含8比特的固定点乘数。为了得出我们的方法,我们首先讨论固定点数乘以不同的固定点数字和完全精确度模型的性能差距。为了减少这种差距,现有的量化方法需要高精度的精确度 INT32 或完全精度乘数乘法。第二,根据统计和算法,我们对不同层次的重量和激活采用不同的固定点格式。我们引入了一种新的算法来自动确定每个层次的正确格式。第三,我们分析了以前的量化算法 -- 参数化剪辑激活(PACT) -- 并用不同的固定点计算方法对固定点进行重新配置。最后,我们用固定网络方法来进行精确的平调算。

0

相关内容

可约的

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【CMU】图卷积神经网络中的池化综述，Pooling in Graph Convolutional Neural Network

【CMU】图卷积神经网络中的池化综述，Pooling in Graph Convolutional Neural Network

专知会员服务

46+阅读 · 2020年4月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

RSK2介导Ras/MAPK对PTEN/Akt的调控作用：肠癌EGFR单抗获得性耐药的新机制？

国家自然科学基金

0+阅读 · 2015年12月31日

Notch通路相关基因甲基化对牙发育的调控机制研究

国家自然科学基金

1+阅读 · 2015年12月31日

移动云服务中轻量级设备隐私保护技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于聚合物仿生纳米通道的高灵敏microRNA生物传感器

国家自然科学基金

0+阅读 · 2013年12月31日

基于GPU的并行不变特征图像匹配技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

一类拟线性Schrodinger方程(组)解的存在性和集中现象研究

国家自然科学基金

0+阅读 · 2012年12月31日

车载激光扫描点云与全景影像的高精度配准方法

国家自然科学基金

0+阅读 · 2012年12月31日

芍药切花ACS和ETR1基因的克隆、时空表达及功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

Per-clip and per-bitrate adaptation of the Lagrangian multiplier in video coding

Arxiv

0+阅读 · 2022年4月19日

FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks

Arxiv

0+阅读 · 2022年4月19日

Deep Equilibrium Optical Flow Estimation

Arxiv

0+阅读 · 2022年4月18日

On Arbitrary Compression for Decentralized Consensus and Stochastic Optimization over Directed Networks

Arxiv

0+阅读 · 2022年4月18日

Multiplier with Reduced Activities and Minimized Interconnect for Inner Product Arrays

Arxiv

0+阅读 · 2022年4月11日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Cross-Node Federated Graph Neural Network for Spatio-Temporal Data Modeling

Arxiv

17+阅读 · 2021年6月9日

A Survey of Methods for Low-Power Deep Learning and Computer Vision

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Arxiv

14+阅读 · 2020年3月24日

AdderNet: Do We Really Need Multiplications in Deep Learning?

AdderNet: Do We Really Need Multiplications in Deep Learning?

Arxiv

10+阅读 · 2019年12月31日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

VIP会员

文章信息

相关主题

不动点运算

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【CMU】图卷积神经网络中的池化综述，Pooling in Graph Convolutional Neural Network

【CMU】图卷积神经网络中的池化综述，Pooling in Graph Convolutional Neural Network

专知会员服务

46+阅读 · 2020年4月8日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

Per-clip and per-bitrate adaptation of the Lagrangian multiplier in video coding

Arxiv

0+阅读 · 2022年4月19日

FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks

Arxiv

0+阅读 · 2022年4月19日

Deep Equilibrium Optical Flow Estimation

Arxiv

0+阅读 · 2022年4月18日

On Arbitrary Compression for Decentralized Consensus and Stochastic Optimization over Directed Networks

Arxiv

0+阅读 · 2022年4月18日

Multiplier with Reduced Activities and Minimized Interconnect for Inner Product Arrays

Arxiv

0+阅读 · 2022年4月11日

A Survey of Quantization Methods for Efficient Neural Network Inference

Arxiv

22+阅读 · 2021年6月21日

Cross-Node Federated Graph Neural Network for Spatio-Temporal Data Modeling

Arxiv

17+阅读 · 2021年6月9日

A Survey of Methods for Low-Power Deep Learning and Computer Vision

A Survey of Methods for Low-Power Deep Learning and Computer Vision

Arxiv

14+阅读 · 2020年3月24日

AdderNet: Do We Really Need Multiplications in Deep Learning?

AdderNet: Do We Really Need Multiplications in Deep Learning?

Arxiv

10+阅读 · 2019年12月31日

A Survey of Model Compression and Acceleration for Deep Neural Networks

Arxiv

66+阅读 · 2019年9月8日

相关基金

RSK2介导Ras/MAPK对PTEN/Akt的调控作用：肠癌EGFR单抗获得性耐药的新机制？

国家自然科学基金

0+阅读 · 2015年12月31日

Notch通路相关基因甲基化对牙发育的调控机制研究

国家自然科学基金

1+阅读 · 2015年12月31日

移动云服务中轻量级设备隐私保护技术研究

国家自然科学基金

1+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于聚合物仿生纳米通道的高灵敏microRNA生物传感器

国家自然科学基金

0+阅读 · 2013年12月31日

基于GPU的并行不变特征图像匹配技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

关于AI-半环簇与 Conway半环簇的研究

国家自然科学基金

1+阅读 · 2012年12月31日

一类拟线性Schrodinger方程(组)解的存在性和集中现象研究

国家自然科学基金

0+阅读 · 2012年12月31日

车载激光扫描点云与全景影像的高精度配准方法

国家自然科学基金

0+阅读 · 2012年12月31日

芍药切花ACS和ETR1基因的克隆、时空表达及功能分析

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员