HAQ: 硬件软件自动量化 (HAQ: Hardware-Aware Automated Quantization)

Model quantization is a widely used technique to compress and accelerate deep neural network (DNN) inference. Emergent DNN hardware accelerators begin to support flexible bitwidth (1-8 bits) to further improve the computation efficiency, which raises a great challenge to find the optimal bitwidth for each layer: it requires domain experts to explore the vast design space trading off among accuracy, latency, power, and model size, which is both time-consuming and sub-optimal. Conventional quantization algorithm ignores the different hardware architectures and quantizes all the layers in an uniform way. In this paper, we introduce the Hardware-Aware Automated Quantization (HAQ) framework which leverages the reinforcement learning to automatically determine the quantization policy, and we take the hardware accelerator's feedback in the design loop. Rather than relying on proxy signals such as FLOPs and model size, we employ a hardware simulator to generate direct feedback signals to the RL agent. Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures. Our framework effectively reduced the latency by 1.4-1.95x and the energy consumption by 1.9x with negligible loss of accuracy compared with the fixed bitwidth (8 bits) quantization. Our framework reveals that the optimal policies on different hardware architectures (i.e., edge and cloud architectures) under different resource constraints (i.e., latency, power and model size) are drastically different. We interpreted the implication of different quantization policies, which offer insights for both neural network architecture design and hardware architecture design.

翻译：模型量化是一种广泛使用的压缩和加速深神经网络( DNNN) 推导技术。新兴 DNNN 硬硬件加速器开始支持灵活的比特度( 1-8 比特) 以进一步提高计算效率, 这提出了为每层找到最佳比特度的巨大挑战: 它需要域专家探索在精确度、延缓度、电力和模型大小之间进行巨大的设计空间交换, 它既耗时又亚优。常规量化算法忽略了不同的硬件结构, 以统一的方式将所有层量化。在本文中, 我们引入了硬质- Aware 自动量化( HAQQ) 框架, 利用强化学习来自动确定每个层的最佳比特的比特位位位位位位值。我们在设计循环中选择硬件加速器的反馈。而不是依靠FLOPs和模型大小等代理信号, 我们使用硬件模拟算法为 RL 代理商生成直接的反馈信号。与常规方法相比, 我们的框架是完全自动化的, 我们的网络设计自动度设计自动和智能结构的精度解释, 可以将我们内部的精度结构的精度的精度结构两种的精度化。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

84+阅读 · 2020年2月18日

【斯坦福大学】领域自适应小样本生成（DAWSON: A Domain Adaptive Few Shot Generation Framework）

专知会员服务

36+阅读 · 2020年1月7日

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

97+阅读 · 2019年12月23日