This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, which outperform other state-of-the-art quantization approaches, including narrow-precision floating-point and block floating-point. MX utilizes multiple levels of quantization scaling with ultra-fine scaling factors based on shared microexponents in the hardware. The effectiveness of MX is demonstrated on real-world models including large-scale generative pretraining and inferencing, and production-scale recommendation systems.
翻译:本文介绍了块数据表示(BDR),这是一种用于探索和评估各种窄精度格式的框架,以用于深度学习。它可以比较流行的量化标准,并通过BDR,确认了基于共享微指数(MX)的新格式,这些格式优于其他最先进的量化方法,包括窄精度浮点和块浮点数。MX在硬件上利用多个量化级别的量化缩放,其中超细缩放因子基于共享微指数。在实际模型中,包括大规模生成预训练和推理,以及生产规模的推荐系统,MX的有效性得到了证明。