高效率神经网络推断的量化方法调查 (A Survey of Quantization Methods for Efficient Neural Network Inference)

As soon as abstract mathematical computations were adapted to computation on digital computers, the problem of efficient representation, manipulation, and communication of the numerical values in those computations arose. Strongly related to the problem of numerical representation is the problem of quantization: in what manner should a set of continuous real-valued numbers be distributed over a fixed discrete set of numbers to minimize the number of bits required and also to maximize the accuracy of the attendant computations? This perennial problem of quantization is particularly relevant whenever memory and/or computational resources are severely restricted, and it has come to the forefront in recent years due to the remarkable performance of Neural Network models in computer vision, natural language processing, and related areas. Moving from floating-point representations to low-precision fixed integer values represented in four bits or less holds the potential to reduce the memory footprint and latency by a factor of 16x; and, in fact, reductions of 4x to 8x are often realized in practice in these applications. Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks. In this article, we survey approaches to the problem of quantizing the numerical values in deep Neural Network computations, covering the advantages/disadvantages of current methods. With this survey and its organization, we hope to have presented a useful snapshot of the current research in quantization for Neural Networks and to have given an intelligent organization to ease the evaluation of future research in this area.

翻译：一旦抽象数学计算适应数字计算机的计算,这些计算中的数字值的高效代表、操纵和交流问题就立即出现。与数字代表问题密切相关的是数字代表问题:从浮动点代表制转向低精度固定整数(以四位或更少表示),如何在固定离散数组中分配一组连续实际价值数字,以尽量减少所需的比特数,并尽量提高相应计算的准确性?当记忆和(或)计算资源受到严重限制时,长期的量化问题就特别相关。因此,由于近年来神经网络模型在计算机视觉、自然语言处理和相关领域的显著表现,这个问题已进入前沿。从浮动点代表制到低精度固定整数组代表制,有可能将记忆足迹和延缩率降低到16x系数;事实上,在这些应用中,将4x至8x这一常年的量化问题在实际应用中实现。因此,在当前的智能网络研究中作为研究的一个重要和非常活跃的子领域出现。在目前对当前精准性的研究中,在深度网络内部网络的计算方法中,我们发现,在目前对当前精度组织进行精确度的研究中,在深度计算方法中,这个深度的深度网络的计算方法中,这个深度的深度分析问题已经呈现了。

相关内容

Neural Networks

关注 1651

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/