Image super-resolution (SR) serves as a fundamental tool for the processing and transmission of multimedia data. Recently, Transformer-based models have achieved competitive performances in image SR. They divide images into fixed-size patches and apply self-attention on these patches to model long-range dependencies among pixels. However, this architecture design is originated for high-level vision tasks, which lacks design guideline from SR knowledge. In this paper, we aim to design a new attention block whose insights are from the interpretation of Local Attribution Map (LAM) for SR networks. Specifically, LAM presents a hierarchical importance map where the most important pixels are located in a fine area of a patch and some less important pixels are spread in a coarse area of the whole image. To access pixels in the coarse area, instead of using a very large patch size, we propose a lightweight Global Pixel Access (GPA) module that applies cross-attention with the most similar patch in an image. In the fine area, we use an Intra-Patch Self-Attention (IPSA) module to model long-range pixel dependencies in a local patch, and then a $3\times3$ convolution is applied to process the finest details. In addition, a Cascaded Patch Division (CPD) strategy is proposed to enhance perceptual quality of recovered images. Extensive experiments suggest that our method outperforms state-of-the-art lightweight SR methods by a large margin. Code is available at https://github.com/passerer/HPINet.
翻译:图像超分辨率 (SR) 是处理和传输多媒体数据的基本工具 。 最近, 以变异器为基础的模型在图像SR 中实现了竞争性性能。 以变异器为基础的模型在图像SR 中实现了竞争性性能。 它们将图像分成固定大小的补丁, 并在这些补丁上应用自我关注来模拟像素之间的远距离依赖性模式。 但是, 这个架构设计是用于高层次的视觉任务, 缺乏来自SR知识的设计指南 。 在本文中, 我们的目标是设计一个新的关注区块, 其洞察力来自对本地自定义地图( LAM ) 的诠释。 具体地, LAM 展示了一个等级重要性图, 其中最重要的像素位于一个精细小的补丁区域, 而一些不太重要的像素则分布于整个图像的粗略区域 。 要访问粗度区域, 而不是使用非常大的补丁的缩缩略度全球像素访问模块, 我们用一个与图像中最相似的补丁点交叉使用。 在精细区域, 我们使用一个加的自备自定义的自定义( ISA) 高级图像中, 将一个大型的自译的图像升级的图像升级的图像升级模块应用到一个模型中, 。 。 在模型中, 将一个模型的缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩成成一个模型的缩缩缩缩缩缩成一个模型的缩缩缩缩的缩的缩的缩的缩的缩的缩缩缩缩缩缩缩缩算法 。