The transformer architectures with attention mechanisms have obtained success in Nature Language Processing (NLP), and Vision Transformers (ViTs) have recently extended the application domains to various vision tasks. While achieving high performance, ViTs suffer from large model size and high computation complexity that hinders the deployment of them on edge devices. To achieve high throughput on hardware and preserve the model accuracy simultaneously, we propose VAQF, a framework that builds inference accelerators on FPGA platforms for quantized ViTs with binary weights and low-precision activations. Given the model structure and the desired frame rate, VAQF will automatically output the required quantization precision for activations as well as the optimized parameter settings of the accelerator that fulfill the hardware requirements. The implementations are developed with Vivado High-Level Synthesis (HLS) on the Xilinx ZCU102 FPGA board, and the evaluation results with the DeiT-base model indicate that a frame rate requirement of 24 frames per second (FPS) is satisfied with 8-bit activation quantization, and a target of 30 FPS is met with 6-bit activation quantization. To the best of our knowledge, this is the first time quantization has been incorporated into ViT acceleration on FPGAs with the help of a fully automatic framework to guide the quantization strategy on the software side and the accelerator implementations on the hardware side given the target frame rate. Very small compilation time cost is incurred compared with quantization training, and the generated accelerators show the capability of achieving real-time execution for state-of-the-art ViT models on FPGAs.
翻译:具有关注机制的变压器结构在自然语言处理(NLP)中取得了成功,视野变异器最近将应用域扩大到各种视觉任务。在取得高性能的同时,VAQF将自动输出启动所需的量定精度以及满足硬件要求的加速器的最佳参数设置。为了在硬件上达到高输送量并同时保持模型准确性,我们提议VAQF,这是一个在具有二进制重量和低精度启动功能的量化VIT的FPGA平台上建立推力加速器的框架。鉴于模型结构和期望的框架率,VAQF将自动输出启动所需的量定精度以及符合硬件要求的优化计算器的精度参数设置。在Xilinx ZCU102 FPGA 董事会上与VILA 高级合成器的加速性合成(HLS)一起开发,在DeiT 基建模模型上,每个侧端(FPS)满足了24个框架的要求,在启动量框架上自动设定精确度的精确度精确度精确度精确度精确度精确度,在运行过程中,在Slivex平面平段战略上实现了启动速度战略,在快速同步的加速化过程中,一个硬度上显示同步的精度的精度。