Vis-TOP: 视觉变压器重叠处理器 (Vis-TOP: Visual Transformer Overlay Processor)

In recent years, Transformer has achieved good results in Natural Language Processing (NLP) and has also started to expand into Computer Vision (CV). Excellent models such as the Vision Transformer and Swin Transformer have emerged. At the same time, the platform for Transformer models was extended to embedded devices to meet some resource-sensitive application scenarios. However, due to the large number of parameters, the complex computational flow and the many different structural variants of Transformer models, there are a number of issues that need to be addressed in its hardware design. This is both an opportunity and a challenge. We propose Vis-TOP (Visual Transformer Overlay Processor), an overlay processor for various visual Transformer models. It differs from coarse-grained overlay processors such as CPU, GPU, NPE, and from fine-grained customized designs for a specific model. Vis-TOP summarizes the characteristics of all visual Transformer models and implements a three-layer and two-level transformation structure that allows the model to be switched or changed freely without changing the hardware architecture. At the same time, the corresponding instruction bundle and hardware architecture are designed in three-layer and two-level transformation structure. After quantization of Swin Transformer tiny model using 8-bit fixed points (fix_8), we implemented an overlay processor on the ZCU102. Compared to GPU, the TOP throughput is 1.5x higher. Compared to the existing Transformer accelerators, our throughput per DSP is between 2.2x and 11.7x higher than others. In a word, the approach in this paper meets the requirements of real-time AI in terms of both resource consumption and inference speed. Vis-TOP provides a cost-effective and power-effective solution based on reconfigurable devices for computer vision at the edge.

翻译：近年来,变换器在自然语言处理(NLP)中取得了良好成果,并开始扩展为计算机视野(CV) 。视觉变换器和Swin变换器等极好的模型已经出现。与此同时, 变换器模型平台被扩展为嵌入设备, 以满足某些资源敏感的应用设想。但是, 由于参数众多, 计算流程复杂, 变换器模型的结构变异很多, 硬件设计中需要解决若干问题。这既是一个机遇,也是一个挑战。我们提议 Vis- TOP( 视觉变换器重叠处理器), 是各种视觉变换变器模型的叠加处理器。同时, 变换器的变换器和变压器的变法, 在Squal- 递增机中, 使用Squalder- dal- dal- equal- manual- deal- ruil-deal-deal- deal- deal- deal- deal- deal-deal- deal- rual- ruver- ruver- ruver- ruder- ruder- ruder- vial- ruder- ruder- ruder- ruder- ruder- ruder- ruder- ruder- ruce- ruce- la- la- la- ruder- la- la- la- la- laut- laut- la- la- laut- laut- la- laut- la- laut- laut- laut- laut- laut- laut- laut- laut-s- laut- laut- laut-s- laut- laut-s- laut- laut- lautd- laut- lautd- la- laut the la- la- laut the laut- la- la- la- la- la- la- la- la- la- la- la- la- la- laut- la-