固定点固定点对嵌入平台量化推断的进动神经神经网络进行量化 (Fixed-point Quantization of Convolutional Neural Networks for Quantized Inference on Embedded Platforms)

Convolutional Neural Networks (CNNs) have proven to be a powerful state-of-the-art method for image classification tasks. One drawback however is the high computational complexity and high memory consumption of CNNs which makes them unfeasible for execution on embedded platforms which are constrained on physical resources needed to support CNNs. Quantization has often been used to efficiently optimize CNNs for memory and computational complexity at the cost of a loss of prediction accuracy. We therefore propose a method to optimally quantize the weights, biases and activations of each layer of a pre-trained CNN while controlling the loss in inference accuracy to enable quantized inference. We quantize the 32-bit floating-point precision parameters to low bitwidth fixed-point representations thereby finding optimal bitwidths and fractional offsets for parameters of each layer of a given CNN. We quantize parameters of a CNN post-training without re-training it. Our method is designed to quantize parameters of a CNN taking into account how other parameters are quantized because ignoring quantization errors due to other quantized parameters leads to a low precision CNN with accuracy losses of up to 50% which is far beyond what is acceptable. Our final method therefore gives a low precision CNN with accuracy losses of less than 1%. As compared to a method used by commercial tools that quantize all parameters to 8-bits, our approach provides quantized CNN with averages of 53% lower memory consumption and 77.5% lower cost of executing multiplications for the two CNNs trained on the four datasets that we tested our work on. We find that layer-wise quantization of parameters significantly helps in this process.

翻译：在图像分类任务中, Convolution Neal 网络(CNNs) 已被证明是一个最先进的最先进的图像分类方法。然而,一个缺点是CNN的计算复杂度和存储耗用率高,使得它们无法在内嵌平台上执行,这些平台对支持CNN所需的实物资源有限制。量化常常被用来高效率地优化CNN的记忆和计算复杂性,以牺牲预测准确度。因此,我们提出了一个方法,以最佳的方式对经过预先训练的CNN的每层的重量、偏向和启动进行定量分析。5 在控制推算精度误差的准确度以允许量化的多度计算。我们将32位浮点精确度参数量化为低的固定点代表点,从而找到给给给给定CNN的每层参数最佳的位宽度和分数偏移,而无需再培训。我们的方法是量化CNN的参数,在考虑其他参数如何被四分解,因为忽略了计算精度误度的推误度,因为忽略了CNNCN的精度误度误度误度误度,因此比了50号的精确度的精确度的精确度的精确度,因此导致了所有的精确度损失的精确度损失的精确度为最低。