Real-world image denoising is a practical image restoration problem that aims to obtain clean images from in-the-wild noisy input. Recently, Vision Transformer (ViT) exhibits a strong ability to capture long-range dependencies and many researchers attempt to apply ViT to image denoising tasks. However, real-world image is an isolated frame that makes the ViT build the long-range dependencies on the internal patches, which divides images into patches and disarranges the noise pattern and gradient continuity. In this article, we propose to resolve this issue by using a continuous Wavelet Sliding-Transformer that builds frequency correspondence under real-world scenes, called DnSwin. Specifically, we first extract the bottom features from noisy input images by using a CNN encoder. The key to DnSwin is to separate high-frequency and low-frequency information from the features and build frequency dependencies. To this end, we propose Wavelet Sliding-Window Transformer that utilizes discrete wavelet transform, self-attention and inverse discrete wavelet transform to extract deep features. Finally, we reconstruct the deep features into denoised images using a CNN decoder. Both quantitative and qualitative evaluations on real-world denoising benchmarks demonstrate that the proposed DnSwin performs favorably against the state-of-the-art methods.
翻译:真实世界图像脱色是一个实际的图像恢复问题, 目的是从周围的噪音输入中获取清洁图像。 最近, 愿景变异器( VIT) 展示了捕捉远程依赖性的强大能力, 许多研究人员试图将 Vit 应用到图像脱色任务中。 然而, 真实世界图像是一个孤立的框架, 使得 Vit在内部补丁上构建高频和低频信息, 从而将图像分解为补丁, 并分解噪音模式和梯度连续性。 在文章中, 我们提议使用连续的波盘流- 变异器来解决这个问题, 在现实世界的场景下建立频率通信。 称为 DnSwin。 具体地说, 我们首先通过使用CNN 编码器来从噪音输入图像中提取底部特征 。 DnSwin 是将高频和低频信息与内部补丁分隔开, 并增加频率依赖性。 为此, 我们提议使用离散波变、 自控和反离子波变换波变换波质转换器, 来提取真实图像的深度特征 。 最后, 我们用 RIS- decodecaldecaldedededecaldealdedede disgrade dismade disgild the pride pride prideal devaldestrational defal devaldal defal defal devel prevational defaldal defal press sutional sutionaldaldaldal press press laction prevational deviewdal deview 。