Vision transformers have recently gained great success on various computer vision tasks; nevertheless, their high model complexity makes it challenging to deploy on resource-constrained devices. Quantization is an effective approach to reduce model complexity, and data-free quantization, which can address data privacy and security concerns during model deployment, has received widespread interest. Unfortunately, all existing methods, such as BN regularization, were designed for convolutional neural networks and cannot be applied to vision transformers with significantly different model architectures. In this paper, we propose PSAQ-ViT, a Patch Similarity Aware data-free Quantization framework for Vision Transformers, to enable the generation of "realistic" samples based on the vision transformer's unique properties for calibrating the quantization parameters. Specifically, we analyze the self-attention module's properties and reveal a general difference (patch similarity) in its processing of Gaussian noise and real images. The above insights guide us to design a relative value metric to optimize the Gaussian noise to approximate the real images, which are then utilized to calibrate the quantization parameters. Extensive experiments and ablation studies are conducted on various benchmarks to validate the effectiveness of PSAQ-ViT, which can even outperform the real-data-driven methods. Code is available at: https://github.com/zkkli/PSAQ-ViT.
翻译:视觉变异器最近在各种计算机视觉任务中取得了巨大的成功;然而,其高模型复杂性使得在资源限制的装置上部署具有挑战性。量化是一种减少模型复杂性的有效方法,在模型部署期间可以解决数据隐私和安全问题的无数据量化得到了广泛的关注。不幸的是,所有现有方法,如BN正规化,都是为革命神经网络设计的,无法应用于具有显著不同模型结构的视觉变异器。在本文件中,我们提议为愿景变异器建立一个“完全相似的无数据识别质化框架,即PSAQ-ViT,以便能够根据愿景变异器在调整四分化参数方面的独特性格生成“现实”样本。具体地说,我们分析自我注意模块的特性,并揭示在处理高斯噪音和真实图像方面的总体差异(相似性)。以上见解指导我们设计一个相对价值的衡量标准,以优化高斯调的噪音,以近似于真实图像,然后用于校准QIVI/Vial的参数。对QA的精确性标定法进行了广泛的实验和对Q-Vial-dal的校准。对QA的校准方法进行。