This work aims to improve the applicability of diffusion models in realistic image restoration. Specifically, we enhance the diffusion model in several aspects such as network architecture, noise level, denoising steps, training image size, and optimizer/scheduler. We show that tuning these hyperparameters allows us to achieve better performance on both distortion and perceptual scores. We also propose a U-Net based latent diffusion model which performs diffusion in a low-resolution latent space while preserving high-resolution information from the original input for the decoding process. Compared to the previous latent-diffusion model which trains a VAE-GAN to compress the image, our proposed U-Net compression strategy is significantly more stable and can recover highly accurate images without relying on adversarial optimization. Importantly, these modifications allow us to apply diffusion models to various image restoration tasks, including real-world shadow removal, HR non-homogeneous dehazing, stereo super-resolution, and bokeh effect transformation. By simply replacing the datasets and slightly changing the noise network, our model, named Refusion, is able to deal with large-size images (e.g., 6000 x 4000 x 3 in HR dehazing) and produces good results on all the above restoration problems. Our Refusion achieves the best perceptual performance in the NTIRE 2023 Image Shadow Removal Challenge and wins 2nd place overall.
翻译:本文旨在改进扩散模型在真实图像修复中的适用性。具体而言,我们改善了扩散模型在网络架构、噪声水平、降噪步骤、训练图像大小和优化器/调度器等方面的几个方面。我们展示了调整这些超参数能够在失真度和感知分数方面实现更好的性能。我们还提出了基于U-Net的潜空间扩散模型,它可以在低分辨率潜空间中执行扩散,同时保留原始输入的高分辨率信息以进行解码过程。与以前的潜空间扩散模型相比,其训练一个VAE-GAN压缩图像的模式相比,我们提出的U-Net压缩策略更加稳定,并且可以不依赖对抗优化恢复高度准确的图像。重要的是,这些修改使我们能够将扩散模型应用于各种图像修复任务,包括实际阴影去除、HR非均匀去雾、立体超分辨率和bokeh效果变换。通过仅更换数据集并轻微更改噪声网络,我们命名的模型Refusion能够处理大型图像(例如,HR去雾中的6000 x 4000 x 3),并在所有上述修复问题上产生良好的结果。我们的Refusion在NTIRE 2023图像阴影去除挑战赛中取得了最佳感知性能,并获得了第二名。