While some studies have proven that Swin Transformer (Swin) with window self-attention (WSA) is suitable for single image super-resolution (SR), the plain WSA ignores the broad regions when reconstructing high-resolution images due to a limited receptive field. In addition, many deep learning SR methods suffer from intensive computations. To address these problems, we introduce the N-Gram context to the low-level vision with Transformers for the first time. We define N-Gram as neighboring local windows in Swin, which differs from text analysis that views N-Gram as consecutive characters or words. N-Grams interact with each other by sliding-WSA, expanding the regions seen to restore degraded pixels. Using the N-Gram context, we propose NGswin, an efficient SR network with SCDP bottleneck taking multi-scale outputs of the hierarchical encoder. Experimental results show that NGswin achieves competitive performance while maintaining an efficient structure when compared with previous leading methods. Moreover, we also improve other Swin-based SR methods with the N-Gram context, thereby building an enhanced model: SwinIR-NG. Our improved SwinIR-NG outperforms the current best lightweight SR approaches and establishes state-of-the-art results. Codes are available at https://github.com/rami0205/NGramSwin.
翻译:尽管一些研究已经证明具有窗口自注意力(WSA)的Swin Transformer(Swin)适用于单一图像超分辨率(SR),但普通的WSA由于受到有限接受域的限制,忽略了重建高分辨率图像时的广泛区域。此外,许多深度学习SR方法计算密集。为解决这些问题,我们首次将N-Gram上下文引入Transformers的低级视觉中。我们将N-Gram定义为Swin中相邻的局部窗口,这与将N-Gram视为连续字符或词汇进行文本分析的方式不同。N-Gram通过滑动WSA相互作用,扩展所见区域以修复降解的像素。使用N-Gram上下文,我们提出了NGswin,这是一种具有SCDP瓶颈的高效SR网络,其采用分层编码器的多尺度输出。实验结果表明,NGswin在保持高效结构的同时,实现了有竞争力的性能,与先前领先方法相比。此外,我们还使用N-Gram上下文改进了其他基于Swin的SR方法,从而构建了一个增强的模型:SwinIR-NG。我们改进的SwinIR-NG优于当前最好的轻量级SR方法,并建立了最先进的结果。代码可在https://github.com/rami0205/NGramSwin上获得。