Vision Transformers (ViTs) have achieved state-of-the-art performance on various computer vision applications. These models, however, have considerable storage and computational overheads, making their deployment and efficient inference on edge devices challenging. Quantization is a promising approach to reducing model complexity; unfortunately, existing efforts to quantize ViTs are simulated quantization (aka fake quantization), which remains floating-point arithmetic during inference and thus contributes little to model acceleration. In this paper, we propose I-ViT, an integer-only quantization scheme for ViTs, to enable ViTs to perform the entire computational graph of inference with integer operations and bit-shifting and no floating-point operations. In I-ViT, linear operations (e.g., MatMul and Dense) follow the integer-only pipeline with dyadic arithmetic, and non-linear operations (e.g., Softmax, GELU, and LayerNorm) are approximated by the proposed light-weight integer-only arithmetic methods. In particular, I-ViT applies the proposed Shiftmax and ShiftGELU, which are designed to use integer bit-shifting to approximate the corresponding floating-point operations. We evaluate I-ViT on various benchmark models and the results show that integer-only INT8 quantization achieves comparable (or even higher) accuracy to the full-precision (FP) baseline. Furthermore, we utilize TVM for practical hardware deployment on the GPU's integer arithmetic units, achieving 3.72~4.11$\times$ inference speedup compared to the FP model.
翻译:视觉转换器(ViTs)在各种计算机视觉应用中取得了最先进的性能。 但是,这些模型具有相当的存储和计算间接成本,使得其部署和在边缘设备上的有效推算具有挑战性。量化是降低模型复杂性的一个很有希望的方法;不幸的是,目前对ViTs量化的努力是模拟量化(aka假量化),在推算期间,这仍然是浮点算术,因此对模型加速作用贡献不大。在本文中,我们提议对ViTs采用一次性的电视技术(I-VitT),一个只对电视技术进行一次性的计算,使ViTs能够用整数值操作和点移动和没有浮动点操作来进行整个计算。在I-ViT中,线性操作(例如,MatMul和Dense)遵循整数管道,在推算期间,非线性操作(e.g.g.、Softmax、GELU和TilNorm),以拟议的轻度整数整数整数整数整数整数的计算方法来估算。特别是I-VI-VI-VI-FT-S-S-S-S-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-S-S-I-I-I-I-S-S-S-I-I-S-S-S-S-I-S-S-S-S-S-S-S-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-