INT8 quantization has become one of the standard techniques for deploying convolutional neural networks (CNNs) on edge devices to reduce the memory and computational resource usages. By analyzing quantized performances of existing mobile-target network architectures, we can raise an issue regarding the importance of network architecture for optimal INT8 quantization. In this paper, we present a new network architecture search (NAS) procedure to find a network that guarantees both full-precision (FLOAT32) and quantized (INT8) performances. We first propose critical but straightforward optimization method which enables quantization-aware training (QAT) : floating-point statistic assisting (StatAssist) and stochastic gradient boosting (GradBoost). By integrating the gradient-based NAS with StatAssist and GradBoost, we discovered a quantization-efficient network building block, Frost bottleneck. Furthermore, we used Frost bottleneck as the building block for hardware-aware NAS to obtain quantization-efficient networks, FrostNets, which show improved quantization performances compared to other mobile-target networks while maintaining competitive FLOAT32 performance. Our FrostNets achieve higher recognition accuracy than existing CNNs with comparable latency when quantized, due to higher latency reduction rate (average 65%).
翻译:通过分析现有移动目标网络架构的量化性能,我们可以提出网络架构对优化INT8量化的重要性问题。在本文中,我们提出了一个新的网络架构搜索(NAS)程序,以找到一个既能保证完全精度(FLOAT32)又能保证量化(INTE8)性能的网络。我们首先提出了关键但直截了当的优化优化方法,使量化(QAT)培训(QAT)得以进行:浮点统计协助(Statassist)和振动梯度提升(GradBoost),通过将基于梯度的NAS与Statassist和GradBoost相结合,我们发现了一个量化效率高的网络建筑块Frost瓶。此外,我们用Froft瓶(Flock瓶)作为硬件认知高效网络的建筑块,以获得量化效率网络(QAT):浮点统计(QAT):浮点统计协助(Statassist)和振动梯度梯度梯度提升(GradBoost)。通过将梯度率比其他移动目标更高的升级化(Frestalestalityality)网络取得更高的水平,同时显示比其他移动目标降低。