深神经网络的等量化和加速度 (Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks)

Quantization has been proven to be a vital method for improving the inference efficiency of deep neural networks (DNNs). However, it is still challenging to strike a good balance between accuracy and efficiency while quantizing DNN weights or activation values from high-precision formats to their quantized counterparts. We propose a new method called elastic significant bit quantization (ESB) that controls the number of significant bits of quantized values to obtain better inference accuracy with fewer resources. We design a unified mathematical formula to constrain the quantized values of the ESB with a flexible number of significant bits. We also introduce a distribution difference aligner (DDA) to quantitatively align the distributions between the full-precision weight or activation values and quantized values. Consequently, ESB is suitable for various bell-shaped distributions of weights and activation of DNNs, thus maintaining a high inference accuracy. Benefitting from fewer significant bits of quantized values, ESB can reduce the multiplication complexity. We implement ESB as an accelerator and quantitatively evaluate its efficiency on FPGAs. Extensive experimental results illustrate that ESB quantization consistently outperforms state-of-the-art methods and achieves average accuracy improvements of 4.78%, 1.92%, and 3.56% over AlexNet, ResNet18, and MobileNetV2, respectively. Furthermore, ESB as an accelerator can achieve 10.95 GOPS peak performance of 1k LUTs without DSPs on the Xilinx ZCU102 FPGA platform. Compared with CPU, GPU, and state-of-the-art accelerators on FPGAs, the ESB accelerator can improve the energy efficiency by up to 65x, 11x, and 26x, respectively.

翻译：量化已被证明是提高深神经网络(DNNs)发酵效率的重要方法。然而,在从高精度格式量化DNN加权数或激活值之间从高精度格式量化到其四分数值的同时,在准确性和效率之间取得一个良好的平衡,仍然具有挑战性。因此,我们提出了一个名为弹性显著位数位数的量化值数量控制器,以更少的资源获得更好的推断准确性。我们设计了一个统一的数学公式,以灵活数量相当的位数限制ESB的量化值。我们还引入了一个分布差异校正(DADA),以量化地将全精度加权或激活值与四分数数值之间的分布进行匹配。因此,ESB适合各种钟形的重量分布,从而保持高推力准确性。从较少的量化值中获利,ESBSB可以降低倍性复杂度。我们把ESB作为Ecerroral-SBs的精确性能和定量的精确性能。 ESB(EQ-C-SB) 和SQ-SQ-SD-SQ-Sq-Sq-Sq-Sq-Sq-Sq-Sq-Sq-Sq-Sq-Sq-Sq-Sq-Sq-Sq-Sq-Sq-Sq-x-x-x-sal-Sq-Sq-x-x-Sq-sal-x-x-x-s-x-Sq-s-x-x-Sq-x-Sq-s-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-s-x-SQ-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-