Adapting the Diffusion Probabilistic Model (DPM) for direct image super-resolution is wasteful, given that a simple Convolutional Neural Network (CNN) can recover the main low-frequency content. Therefore, we present ResDiff, a novel Diffusion Probabilistic Model based on Residual structure for Single Image Super-Resolution (SISR). ResDiff utilizes a combination of a CNN, which restores primary low-frequency components, and a DPM, which predicts the residual between the ground-truth image and the CNN-predicted image. In contrast to the common diffusion-based methods that directly use LR images to guide the noise towards HR space, ResDiff utilizes the CNN's initial prediction to direct the noise towards the residual space between HR space and CNN-predicted space, which not only accelerates the generation process but also acquires superior sample quality. Additionally, a frequency-domain-based loss function for CNN is introduced to facilitate its restoration, and a frequency-domain guided diffusion is designed for DPM on behalf of predicting high-frequency details. The extensive experiments on multiple benchmark datasets demonstrate that ResDiff outperforms previous diffusion-based methods in terms of shorter model convergence time, superior generation quality, and more diverse samples.
翻译:由于简单的进化神经网络(CNN)可以回收主要的低频内容,因此,我们介绍基于单一图像超分辨率残余结构的新型扩散概率模型ResDiff。ResDiff采用有线电视新闻网(CNN)的组合组合,该模型恢复了初级低频组件,并预测了地面真相图像和CNN预设图像之间的剩余部分。与直接使用LR图像引导进入HR空间噪音的通用扩散法不同,ResDiff利用CNN的初步预测将噪音引向HR空间和CNN预设空间之间的剩余空间,后者不仅加快了生成过程,而且获得了更高的样本质量。此外,引入了CNN的频率-长期损失功能,以促进其恢复,并且为DPM设计了以频率-持续引导传播为主的频率方法,以预测高频详细信息,ResDiff利用CNN的初步预测将噪音引向HR空间的噪音引向人类空间的剩余空间,不仅加快了生成过程,而且还获得了更高的样本质量。</s>