Post-training quantization (PTQ), which only requires a tiny dataset for calibration without end-to-end retraining, is a light and practical model compression technique. Recently, several PTQ schemes for vision transformers (ViTs) have been presented; unfortunately, they typically suffer from non-trivial accuracy degradation, especially in low-bit cases. In this paper, we propose RepQ-ViT, a novel PTQ framework for ViTs based on quantization scale reparameterization, to address the above issues. RepQ-ViT decouples the quantization and inference processes, where the former employs complex quantizers and the latter employs scale-reparameterized simplified quantizers. This ensures both accurate quantization and efficient inference, which distinguishes it from existing approaches that sacrifice quantization performance to meet the target hardware. More specifically, we focus on two components with extreme distributions: post-LayerNorm activations with severe inter-channel variation and post-Softmax activations with power-law features, and initially apply channel-wise quantization and log$\sqrt{2}$ quantization, respectively. Then, we reparameterize the scales to hardware-friendly layer-wise quantization and log2 quantization for inference, with only slight accuracy or computational costs. Extensive experiments are conducted on multiple vision tasks with different model variants, proving that RepQ-ViT, without hyperparameters and expensive reconstruction procedures, can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level.
翻译:培训后夸度(PTQ)只是需要为不端到端再培训的校准提供极小的数据集,只是一种光和实用的模型压缩技术。最近,提出了几个用于视觉变压器的PTQ方案(ViTs ) ; 不幸的是,它们通常会受到非三重性精度退化的影响,特别是在低位情况中。在本文中,我们提议为ViTs提供REpQ-ViT(RepQQ)这个基于四级重新校准的新的PTQ框架,以解决上述问题。 RepQ-VT(ViT) 使量化和推断过程变得复杂,前者使用复杂的精度变精度和推断,后者使用规模再校准的简化四级变压器;这确保了准确的四分级化和高效的推断,这与现有的牺牲四级化性化性能效果的方法不同。 更具体地说,我们侧重于两个极分化的成分: 后级内部变换电流和后变压激活具有强的校正法特性的精度进程。 最初将精度的精度修正的精度修正的精度 和二次变的精度 和正的精度 级的精度 和正级的精度再再进行的精度的精度的精度 。