关于神经网络量化的白皮书 (A White Paper on Neural Network Quantization)

While neural networks have advanced the frontiers in many applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is key if we want to integrate modern networks into edge devices with strict power and compute requirements. Neural network quantization is one of the most effective ways of achieving these savings but the additional noise it induces can lead to accuracy degradation. In this white paper, we introduce state-of-the-art algorithms for mitigating the impact of quantization noise on the network's performance while maintaining low-bit weights and activations. We start with a hardware motivated introduction to quantization and then consider two main classes of algorithms: Post-Training Quantization (PTQ) and Quantization-Aware-Training (QAT). PTQ requires no re-training or labelled data and is thus a lightweight push-button approach to quantization. In most cases, PTQ is sufficient for achieving 8-bit quantization with close to floating-point accuracy. QAT requires fine-tuning and access to labeled training data but enables lower bit quantization with competitive results. For both solutions, we provide tested pipelines based on existing literature and extensive experimentation that lead to state-of-the-art performance for common deep learning models and tasks.

翻译：虽然神经网络在许多应用中已经发展了前沿,但它们往往以高计算成本达到。如果我们想要将现代网络纳入具有严格功率和计算要求的边缘装置,那么降低神经网络推断的力量和延迟度是关键。神经网络量化是实现这些节约的最有效方法之一,但是它引起的额外噪音可能导致精确度下降。在本白皮书中,我们引入了降低四分制噪音对网络性能影响的最先进的算法,同时保持低比重和活化。我们首先从硬件入门到量化,然后考虑两大类算法:培训后量化(PTQ)和量化-软件培训(QAT)。 PTQ不需要对数据进行再培训或贴标签的噪音,因此,它是一种较轻的推键方法。在多数情况下,PTQ足以实现8比四分制,同时保持近浮点精确度。 QAT需要精确的硬质化引入量化,然后考虑两大类算法:培训后量化(PTQQQ)和量化-软件培训(QAT)。 PTQQ不需要再进行再培训,而是以普通的实验性测试,我们现有的实验结果。

相关内容

Neural Networks

关注 1648

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

ICML 2021论文收录

专知会员服务

123+阅读 · 2021年5月8日

【快讯】NeurIPS2020结果出炉，1900篇上榜，你的paper中了吗？

专知会员服务

54+阅读 · 2020年9月26日

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日