FP8 量化:指数的力量 (FP8 Quantization: The Power of the Exponent)

When quantizing neural networks for efficient inference, low-bit integers are the go-to format for efficiency. However, low-bit floating point numbers have an extra degree of freedom, assigning some bits to work on an exponential scale instead. This paper in-depth investigates this benefit of the floating point format for neural network inference. We detail the choices that can be made for the FP8 format, including the important choice of the number of bits for the mantissa and exponent, and show analytically in which settings these choices give better performance. Then we show how these findings translate to real networks, provide an efficient implementation for FP8 simulation, and a new algorithm that enables the learning of both the scale parameters and the number of exponent bits in the FP8 format. Our chief conclusion is that when doing post-training quantization for a wide range of networks, the FP8 format is better than INT8 in terms of accuracy, and the choice of the number of exponent bits is driven by the severity of outliers in the network. We also conduct experiments with quantization-aware training where the difference in formats disappears as the network is trained to reduce the effect of outliers.

翻译：当对神经网络进行量化以进行有效的推断时,低位数整数是效率的上到格式。但是, 低位浮点数具有额外自由度, 将一些位数分配到指数尺度上。本文深入调查了神经网络推导浮点格式的好处。我们详细说明了FP8格式可以作出的选择, 包括选择曼蒂萨和Expent的比特数量的重要选择, 并用分析方式显示这些选择在何种情况下产生更好的性能。然后我们展示这些结果如何转化成真实网络, 为 FP8 模拟提供高效的实施, 以及一种新的算法, 使 FP8 格式既能了解比例参数, 也能了解浮点数。我们的主要结论是, 当对广泛的网络进行后期培训测试时, FP8 格式在准确性方面比 INT8 更好, 以及引用点数的选择由网络外端点的严重性驱动。我们还进行实验, 以 Questrial- Exporization as the diversation of diversal diversation distrations

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【开放书】卡耐基梅隆大学Elaine Shi 教授《Foundations of Distributed Consensus and Blockchains（分布式共识和区块链的基础）》150页pdf

专知会员服务

30+阅读 · 2022年2月22日

【NUS-Xavier教授】注意力神经网络，79页ppt

专知会员服务

65+阅读 · 2021年11月25日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日