This paper tackles the low-efficiency flaw of the vision transformer caused by the high computational/space complexity in Multi-Head Self-Attention (MHSA). To this end, we propose the Hierarchical MHSA (H-MHSA), whose representation is computed in a hierarchical manner. Specifically, we first divide the input image into patches as commonly done, and each patch is viewed as a token. Then, the proposed H-MHSA learns token relationships within local patches, serving as local relationship modeling. Then, the small patches are merged into larger ones, and H-MHSA models the global dependencies for the small number of the merged tokens. At last, the local and global attentive features are aggregated to obtain features with powerful representation capacity. Since we only calculate attention for a limited number of tokens at each step, the computational load is reduced dramatically. Hence, H-MHSA can efficiently model global relationships among tokens without sacrificing fine-grained information. With the H-MHSA module incorporated, we build a family of Hierarchical-Attention-based Transformer Networks, namely HAT-Net. To demonstrate the superiority of HAT-Net in scene understanding, we conduct extensive experiments on fundamental vision tasks, including image classification, semantic segmentation, object detection, and instance segmentation. Therefore, HAT-Net provides a new perspective for the vision transformer. Code and pretrained models are available at https://github.com/yun-liu/HAT-Net.
翻译:本文解决了多负责人自我保护(MHSA)中高计算/空间复杂性造成的视觉变压器低效率缺陷。 为此,我们提议采用等级化的MHSA(H-MHSA)系统(H-MHSA)系统(H-MHSA)系统(H-MHSA)系统,它的代表性以等级化的方式计算。具体地说,我们首先将输入图像分为通常的补丁,然后将每个补丁视为一种象征。然后,拟议的H-MHSA系统在当地补丁中学习象征性关系,作为当地关系模型。然后,将小补丁合并成较大的补丁,H-MHSA(HH-HMSA)系统为少量合并标牌的全球依赖关系模型。最后,对本地和全球关注特性进行汇总,以获得强大的代表能力。由于我们只计算有限的代号数,计算计算到每个补丁的计算负荷。 因此,H-MHHHHHSAA系统可以有效地模拟全球代号关系,而不牺牲精细的信息。 H-MHSAAAT模块的整合,我们在HERI-AA-A-ATI-ATI-ATI-O-O-O-OD-OD-SOL-S-SOL-S-SOL-SOLVOLT-SOL-S-S-S-SOL-S-S-S-SOL AS-S-S-S-S-S-S-S-S-S-SOL-S-SOLVOLVOLVOLVOL-SD-SOLVOLVOLVOLVOLVOL-S-S-S-S-S-S-S-S-S-SY AS-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SOL-S-S-S-S-S-S-SOL-SOL-SOLVOLVOL-SUD-SUD-S-S-L-S-S-SOL-I-S-S-S-S-S-S-S-S-S-S-IM-SUL-S-S-S-S-S-S-S-S-S