Image restoration has witnessed significant advancements with the development of deep learning models. Transformer-based models, particularly those using window-based self-attention, have become a dominant force. However, their performance is constrained by the rigid, non-overlapping window partitioning scheme, which leads to \textit{insufficient feature interaction across windows and limited receptive fields}. This highlights the need for more adaptive and flexible attention mechanisms. In this paper, we propose the Deformable Sliding Window Transformer for Image Restoration (DSwinIR), a new attention mechanism: the {Deformable Sliding Window (DSwin) Attention}. {This mechanism introduces a token-centric and content-aware paradigm that moves beyond the grid and fixed window partition.} It comprises two complementary components. First, it replaces the rigid partitioning with a \textit{token-centric sliding window} paradigm, {making it effective at eliminating boundary artifacts}. Second, it incorporates a \textit{content-aware deformable sampling} strategy, which allows the attention mechanism to learn data-dependent offsets and actively shape its receptive field to focus on the most informative image regions. Extensive experiments show that DSwinIR achieves strong results, including state-of-the-art performance on several evaluated benchmarks. For instance, in all-in-one image restoration, our DSwinIR surpasses the most recent backbone GridFormer by 0.53 dB on the three-task benchmark and 0.87 dB on the five-task benchmark.
翻译:随着深度学习模型的发展,图像复原领域已取得显著进展。基于Transformer的模型,尤其是那些采用基于窗口的自注意力机制的模型,已成为主导力量。然而,其性能受到刚性、非重叠窗口划分方案的限制,这导致\textit{窗口间特征交互不足且感受野有限}。这凸显了对更具适应性和灵活性的注意力机制的需求。本文提出用于图像复原的可变形滑动窗口Transformer(DSwinIR),其核心是一种新的注意力机制:{可变形滑动窗口(DSwin)注意力}。{该机制引入了一种以令牌为中心且内容感知的范式,超越了传统的网格和固定窗口划分。}它包含两个互补的组成部分。首先,它用\textit{以令牌为中心的滑动窗口}范式取代了刚性划分,{使其能有效消除边界伪影}。其次,它融合了一种\textit{内容感知的可变形采样}策略,该策略允许注意力机制学习数据相关的偏移量,并主动调整其感受野以聚焦于信息最丰富的图像区域。大量实验表明,DSwinIR取得了强劲的结果,包括在多个评估基准上达到最先进的性能。例如,在通用图像复原任务中,我们的DSwinIR在三个任务的基准上比最新的骨干网络GridFormer高出0.53 dB,在五个任务的基准上高出0.87 dB。