As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.
翻译:一旦抽象数学计算适应数字计算机的计算,这些计算中的数字值的高效代表、操纵和交流问题就立即出现。与数字代表问题密切相关的是数字代表问题:从浮动点代表制转向低精度固定整数(以四位或更少表示),如何在固定离散数组中分配一组连续实际价值数字,以尽量减少所需的比特数,并尽量提高相应计算的准确性?当记忆和(或)计算资源受到严重限制时,长期的量化问题就特别相关。因此,由于近年来神经网络模型在计算机视觉、自然语言处理和相关领域的显著表现,这个问题已进入前沿。从浮动点代表制到低精度固定整数组代表制,有可能将记忆足迹和延缩率降低到16x系数;事实上,在这些应用中,将4x至8x这一常年的量化问题在实际应用中实现。因此,在当前的智能网络研究中作为研究的一个重要和非常活跃的子领域出现。在目前对当前精准性的研究中,在深度网络内部网络的计算方法中,我们发现,在目前对当前精度组织进行精确度的研究中,在深度计算方法中,这个深度的深度网络的计算方法中,这个深度的深度分析问题已经呈现了。