In this work, we propose a low-bit training framework for convolutional neural networks, which is built around a novel multi-level scaling (MLS) tensor format. Our framework focuses on reducing the energy consumption of convolution operations by quantizing all the convolution operands to low bit-width format. Specifically, we propose the MLS tensor format, in which the element-wise bit-width can be largely reduced. Then, we describethe dynamic quantization and the low-bit tensor convolution arithmetic to leverage the MLS tensor format efficiently. Experiments show that our framework achieves a superior trade-off between the accuracy and the bit-width than previous low-bit training frameworks. For training a variety of models on CIFAR-10, using 1-bit mantissa and 2-bit exponent is adequate to keep the accuracy loss within 1%. And on larger datasets like ImageNet, using 4-bit mantissa and 2-bit exponent is adequate to keep the accuracy loss within 1%. Through the energy consumption simulation of the computing units, we can estimate that training a variety of models with our framework could achieve 8.3~10.2X and 1.9~2.3X higher energy efficiency than training with full-precision and 8-bit floating-point arithmetic, respectively.
翻译:在这项工作中,我们提出一个低位数神经神经网络培训框架,这个框架是围绕新型的多级缩放(MLS)加压格式建立的。我们的框架侧重于通过将所有卷旋剧量化为低位维度格式来降低卷旋操作的能量消耗量。具体地说,我们建议了MLS 振幅格式,在这个格式中,元素偏差的位宽可以大大降低。然后,我们描述了动态量化和低位振动算术,以便有效地利用 MLS 喇叭格式。实验表明,我们的框架在精确度和比先前的低位培训框架的微宽度之间实现了更高的交易。为了在CIFAR-10上培训各种模型,使用1比曼蒂萨和2比特的推力模式足以将精度损失控制在1%之内。在像图像网这样的大数据集上,使用4比曼蒂萨和2比特的推算算法,可以将精度损失控制在1%之内。实验表明,我们的框架在精确度和比以前的低位维值之间实现了更高的交易。我们通过计算单位的能源消费模拟,我们可以分别用8.2和8x标准的模型来进行各种的能源效率培训。