While lightweight ViT framework has made tremendous progress in image super-resolution, its uni-dimensional self-attention modeling, as well as homogeneous aggregation scheme, limit its effective receptive field (ERF) to include more comprehensive interactions from both spatial and channel dimensions. To tackle these drawbacks, this work proposes two enhanced components under a new Omni-SR architecture. First, an Omni Self-Attention (OSA) block is proposed based on dense interaction principle, which can simultaneously model pixel-interaction from both spatial and channel dimensions, mining the potential correlations across omni-axis (i.e., spatial and channel). Coupling with mainstream window partitioning strategies, OSA can achieve superior performance with compelling computational budgets. Second, a multi-scale interaction scheme is proposed to mitigate sub-optimal ERF (i.e., premature saturation) in shallow models, which facilitates local propagation and meso-/global-scale interactions, rendering an omni-scale aggregation building block. Extensive experiments demonstrate that Omni-SR achieves record-high performance on lightweight super-resolution benchmarks (e.g., 26.95 dB@Urban100 $\times 4$ with only 792K parameters). Our code is available at \url{https://github.com/Francis0625/Omni-SR}.
翻译:虽然轻量级ViT框架在图像超分辨率方面取得了巨大进展,但其一维自注意建模以及同质聚合方案限制了其有效感受野(ERF)以包括更全面的空间和通道维度的交互。针对这些缺点,本文在新的Omni-SR架构下提出了两个增强组件。首先,基于密集交互原则提出了一个Omni Self-Attention (OSA)模块,它可以同时从空间和通道维度模型化像素-交互,挖掘来自全息轴(即空间和通道)的潜在相关性。结合主流的窗口分割策略,OSA可以在有吸引力的计算预算下实现优越的性能。其次,提出了多尺度交互方案,以缓解浅层模型中的子优感受野(即过早饱和),这有助于本地传播和中位/全局尺度的交互,形成全息尺度聚合构建块。广泛的实验表明,Omni-SR在轻量级超分辨率基准测试上取得了最高性能记录(例如,具有仅792K参数的Urban100 $\times 4$的26.95 dB)。我们的代码可在\url{https://github.com/Francis0625/Omni-SR}上获得。