While some studies have proven that Swin Transformer (SwinT) with window self-attention (WSA) is suitable for single image super-resolution (SR), SwinT ignores the broad regions for reconstructing high-resolution images due to window and shift size. In addition, many deep learning SR methods suffer from intensive computations. To address these problems, we introduce the N-Gram context to the image domain for the first time in history. We define N-Gram as neighboring local windows in SwinT, which differs from text analysis that views N-Gram as consecutive characters or words. N-Grams interact with each other by sliding-WSA, expanding the regions seen to restore degraded pixels. Using the N-Gram context, we propose NGswin, an efficient SR network with SCDP bottleneck taking all outputs of the hierarchical encoder. Experimental results show that NGswin achieves competitive performance while keeping an efficient structure, compared with previous leading methods. Moreover, we also improve other SwinT-based SR methods with the N-Gram context, thereby building an enhanced model: SwinIR-NG. Our improved SwinIR-NG outperforms the current best lightweight SR approaches and establishes state-of-the-art results. Codes will be available soon.
翻译:虽然一些研究证明,Swin 变形器(Swin T)与窗口自我注意(WSA)适合单一图像超分辨率(SR),但SwinT忽视了因窗口和变换大小而重建高分辨率图像的广大区域;此外,许多深层学习的SR方法也因密集计算而受到影响。为了解决这些问题,我们有史以来第一次将N-Gram环境引入图像域;我们将N-Gram定义为SwinT的相邻地方窗口,这不同于将N-Gram视为连续字符或单词的文本分析。N-Grams通过滑动-SWA相互影响,扩大被视为恢复退化像素的区域。我们建议N-Gram环境建立NGswin,这是一个高效的SR网络,其SCDP瓶颈吸收了所有高级编码的输出。实验结果显示,NGwinwin在保持高效结构的同时,与以往的主要方法不同。我们还改进了其他基于SwinT的S-Gram的SR方法,通过滑动-Warm-Wram环境来扩大显示其他基于NR的方法,从而建立改进了被视为退化像的区域。我们当前最佳的Swin-NG-NG-NGR 将建立更好的模式。