Transformer-based methods have shown impressive performance in low-level vision tasks, such as image super-resolution. However, we find that these networks can only utilize a limited spatial range of input information through attribution analysis. This implies that the potential of Transformer is still not fully exploited in existing networks. In order to activate more input pixels for reconstruction, we propose a novel Hybrid Attention Transformer (HAT). It combines channel attention and self-attention schemes, thus making use of their complementary advantages. Moreover, to better aggregate the cross-window information, we introduce an overlapping cross-attention module to enhance the interaction between neighboring window features. In the training stage, we additionally propose a same-task pre-training strategy to bring further improvement. Extensive experiments show the effectiveness of the proposed modules, and the overall method significantly outperforms the state-of-the-art methods by more than 1dB. Codes and models will be available at https://github.com/chxy95/HAT.
翻译:以变异器为基础的方法在图像超分辨率等低层次的视觉任务中表现出了令人印象深刻的成绩。然而,我们发现这些网络只能通过归因分析利用有限的输入信息空间范围。这意味着变异器的潜力在现有网络中仍然没有得到充分利用。为了激活更多的输入像素,我们建议采用新的混合关注变异器(HAT),将频道注意力和自我注意计划结合起来,从而利用它们的互补优势。此外,为了更好地汇总跨窗口信息,我们引入了一个重叠的交叉注意模块,以加强相邻窗口功能之间的互动。在培训阶段,我们又提出了一个相同的培训前战略,以进一步改进。广泛的实验显示拟议模块的有效性,以及总体方法大大超过1dB。代码和模型将在https://github.com/chxy95/HAT上提供。