网络量化,带有元素- 梯度缩放 (Network Quantization with Element-wise Gradient Scaling) - 专知论文

会员服务 ·

0

离散化 · Networking · 缩放 · Extensibility · 可约的 ·

2021 年 4 月 2 日

Network Quantization with Element-wise Gradient Scaling

翻译：网络量化,带有元素- 梯度缩放

Junghyup Lee,Dohyung Kim,Bumsub Ham

from arxiv, Accepted to CVPR 2021

Network quantization aims at reducing bit-widths of weights and/or activations, particularly important for implementing deep neural networks with limited hardware resources. Most methods use the straight-through estimator (STE) to train quantized networks, which avoids a zero-gradient problem by replacing a derivative of a discretizer (i.e., a round function) with that of an identity function. Although quantized networks exploiting the STE have shown decent performance, the STE is sub-optimal in that it simply propagates the same gradient without considering discretization errors between inputs and outputs of the discretizer. In this paper, we propose an element-wise gradient scaling (EWGS), a simple yet effective alternative to the STE, training a quantized network better than the STE in terms of stability and accuracy. Given a gradient of the discretizer output, EWGS adaptively scales up or down each gradient element, and uses the scaled gradient as the one for the discretizer input to train quantized networks via backpropagation. The scaling is performed depending on both the sign of each gradient element and an error between the continuous input and discrete output of the discretizer. We adjust a scaling factor adaptively using Hessian information of a network. We show extensive experimental results on the image classification datasets, including CIFAR-10 and ImageNet, with diverse network architectures under a wide range of bit-width settings, demonstrating the effectiveness of our method.

翻译：网络量化旨在减少重量和(或)激活的比分宽度,这对于实施硬件资源有限的深神经网络特别重要。多数方法使用直通估计度(STE)来培训量化网络,这避免了零梯度问题,以身份功能取代离散器(即圆函数)的衍生物(即圆函数),避免了零梯度问题。尽管利用STE的量化网络表现得较好,但STE是次优化的,因为它只是简单地在不考虑离散器投入和输出之间离散错误的情况下传播同一梯度。在本文件中,我们建议采用元素偏移梯度梯度梯度缩放(EWGS),这是STE的一个简单而有效的替代方法,在稳定性和准确性方面培训一个比STE更好的四分位化网络(即圆函数)衍生物。鉴于离散器输出的梯度梯度,EWGS向上或向下调高,并且使用缩放梯度梯度的梯度,用于通过反调化程序对网络进行配置。缩放取决于每个梯度的梯度网络的标志, 显示每个梯度输出的深度输出的标志, 和在不断调整的图像结构中, 显示一个不易变频度结构中, 显示一个不易变频度结构中, 显示我们数据的导值的导值的导值结构。

0

相关内容

离散化

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【经典书】《算法精解：C语言描述》，562页pdf，Mastering Algorithms with C

【经典书】《算法精解：C语言描述》，562页pdf，Mastering Algorithms with C

专知会员服务

106+阅读 · 2020年4月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

专知会员服务

14+阅读 · 2020年1月1日

【ECML-PKDD 2019】发现具有简单描述的鲁棒性连通子图（Discovering Robustly Connected Subgraphs withSimple Descriptions）

【ECML-PKDD 2019】发现具有简单描述的鲁棒性连通子图（Discovering Robustly Connected Subgraphs withSimple Descriptions）

专知会员服务

8+阅读 · 2019年12月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

机器学习研究会

6+阅读 · 2017年8月5日

Hamiltonian Deep Neural Networks Guaranteeing Non-vanishing Gradients by Design

Arxiv

0+阅读 · 2021年5月27日

Deconditional Downscaling with Gaussian Processes

Arxiv

0+阅读 · 2021年5月27日

Augmented KRnet for density estimation and approximation

Arxiv

0+阅读 · 2021年5月26日

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Arxiv

0+阅读 · 2021年5月26日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Practical Convex Formulation of Robust One-hidden-layer Neural Network Training

Arxiv

0+阅读 · 2021年5月25日

Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks

Arxiv

7+阅读 · 2020年12月15日

Residual Non-local Attention Networks for Image Restoration

Arxiv

9+阅读 · 2019年3月24日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server

Arxiv

4+阅读 · 2018年4月22日

VIP会员

文章信息

相关主题

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【经典书】《算法精解：C语言描述》，562页pdf，Mastering Algorithms with C

【经典书】《算法精解：C语言描述》，562页pdf，Mastering Algorithms with C

专知会员服务

106+阅读 · 2020年4月25日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

【深度学习架构、模型和技巧集合(TensorFlow/PyTorch)】’Deep Learning Models - A collection of various deep learning architectures, models, and tips'

专知会员服务

59+阅读 · 2020年1月25日

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

专知会员服务

14+阅读 · 2020年1月1日

【ECML-PKDD 2019】发现具有简单描述的鲁棒性连通子图（Discovering Robustly Connected Subgraphs withSimple Descriptions）

【ECML-PKDD 2019】发现具有简单描述的鲁棒性连通子图（Discovering Robustly Connected Subgraphs withSimple Descriptions）

专知会员服务

8+阅读 · 2019年12月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

深度卷积神经网络中的降采样

深度卷积神经网络中的降采样

极市平台

12+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

分布式TensorFlow入门指南

分布式TensorFlow入门指南

机器学习研究会

4+阅读 · 2017年11月28日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

【论文】【论文】王晓刚老师课题组ICCV2017论文：学习特征金字塔用于人体姿态估计（附代码）

机器学习研究会

6+阅读 · 2017年8月5日

相关论文

Hamiltonian Deep Neural Networks Guaranteeing Non-vanishing Gradients by Design

Arxiv

0+阅读 · 2021年5月27日

Deconditional Downscaling with Gaussian Processes

Arxiv

0+阅读 · 2021年5月27日

Augmented KRnet for density estimation and approximation

Arxiv

0+阅读 · 2021年5月26日

Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth

Arxiv

0+阅读 · 2021年5月26日

Scaling Properties of Deep Residual Networks

Arxiv

13+阅读 · 2021年5月25日

Practical Convex Formulation of Robust One-hidden-layer Neural Network Training

Arxiv

0+阅读 · 2021年5月25日

Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks

Arxiv

7+阅读 · 2020年12月15日

Residual Non-local Attention Networks for Image Restoration

Arxiv

9+阅读 · 2019年3月24日

Reducing Parameter Space for Neural Network Training

Arxiv

3+阅读 · 2018年8月17日

MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server

Arxiv

4+阅读 · 2018年4月22日

微信扫码咨询专知VIP会员