Transformer-based methods have achieved impressive image restoration performance due to their capacities to model long-range dependency compared to CNN-based methods. However, advances like SwinIR adopts the window-based and local attention strategy to balance the performance and computational overhead, which restricts employing large receptive fields to capture global information and establish long dependencies in the early layers. To further improve the efficiency of capturing global information, in this work, we propose SwinFIR to extend SwinIR by replacing Fast Fourier Convolution (FFC) components, which have the image-wide receptive field. We also revisit other advanced techniques, i.e, data augmentation, pre-training, and feature ensemble to improve the effect of image reconstruction. And our feature ensemble method enables the performance of the model to be considerably enhanced without increasing the training and testing time. We applied our algorithm on multiple popular large-scale benchmarks and achieved state-of-the-art performance comparing to the existing methods. For example, our SwinFIR achieves the PSNR of 32.83 dB on Manga109 dataset, which is 0.8 dB higher than the state-of-the-art SwinIR method.
翻译:以变换器为基础的方法由于其与CNN为基础的方法相比能够模拟长距离依赖性,取得了令人印象深刻的图像恢复性能;然而,SwinIR等进步采用基于窗口和当地注意的战略,平衡性能和计算间接费用,这限制了使用大型可接收字段获取全球信息并在早期建立长期依赖性;为了进一步提高获取全球信息的效率,我们建议SwinFIR将SwinIR扩展,取代具有全图像可接收域的快速Fourier Convolution(FFC)组件。我们还回顾了其他先进技术,即数据增强、培训前和功能组合,以提高图像重建的效果。我们的特性组合法使得模型的性能在不增加培训和测试时间的情况下大大提高。我们用我们的算法对多种流行的大型基准进行了多次应用,并取得了与现有方法相比较的状态性能。例如,我们的SwinFIR在Manga109数据集上实现了32.83 dB的PSNR,该方法比州SwinIR高出0.8 dB。