自监督背景下的上下文感知变换器实现真实世界图像去噪 (Self-Supervised Image Denoising for Real-World Images with Context-aware Transformer)

In recent years, the development of deep learning has been pushing image denoising to a new level. Among them, self-supervised denoising is increasingly popular because it does not require any prior knowledge. Most of the existing self-supervised methods are based on convolutional neural networks (CNN), which are restricted by the locality of the receptive field and would cause color shifts or textures loss. In this paper, we propose a novel Denoise Transformer for real-world image denoising, which is mainly constructed with Context-aware Denoise Transformer (CADT) units and Secondary Noise Extractor (SNE) block. CADT is designed as a dual-branch structure, where the global branch uses a window-based Transformer encoder to extract the global information, while the local branch focuses on the extraction of local features with small receptive field. By incorporating CADT as basic components, we build a hierarchical network to directly learn the noise distribution information through residual learning and obtain the first stage denoised output. Then, we design SNE in low computation for secondary global noise extraction. Finally the blind spots are collected from the Denoise Transformer output and reconstructed, forming the final denoised image. Extensive experiments on the real-world SIDD benchmark achieve 50.62/0.990 for PSNR/SSIM, which is competitive with the current state-of-the-art method and only 0.17/0.001 lower. Visual comparisons on public sRGB, Raw-RGB and greyscale datasets prove that our proposed Denoise Transformer has a competitive performance, especially on blurred textures and low-light images, without using additional knowledge, e.g., noise level or noise type, regarding the underlying unknown noise.

翻译：近年来，深度学习的发展推动了图像去噪的新水平。其中，自监督去噪因不需要任何先验知识而越来越受欢迎。大多数现有的自监督方法基于卷积神经网络（CNN），但由于接受域的局部性而会导致色彩偏移或纹理损失。本文提出了一种针对真实世界图像去噪的新型变换器Denoise Transformer，主要由上下文感知Denoise Transformer（CADT）单元和次级噪声提取（SNE）块构建。CADT被设计为双分支结构，其中全局分支使用基于窗口的变换器编码器提取全局信息，而本地分支则聚焦于具有小接受域的局部特征的提取。通过将CADT作为基本组件，我们构建了一个分层网络通过残差学习直接学习噪声分布信息并获得第一阶段去噪输出。然后，我们为次级全局噪声提取设计了低计算的SNE。最后，我们从Denoise Transformer输出中收集盲点并重构，形成最终的去噪图像。对真实世界SIDD基准测试进行的广泛实验实现的PSNR / SSIM为50.62 / 0.990，与当前最先进的方法竞争力强，仅低0.17 / 0.001。对公共sRGB，Raw-RGB和灰度数据集上的视觉比较证明我们提出的Denoise Transformer具有竞争性的性能，特别是在模糊的纹理和低光照的图像上，而无需使用关于未知噪声的其他知识，例如噪声水平或噪声类型。