Diffusion model (DM) has achieved SOTA performance by modeling the image synthesis process into a sequential application of a denoising network. However, different from image synthesis generating each pixel from scratch, most pixels of image restoration (IR) are given. Thus, for IR, traditional DMs running massive iterations on a large model to estimate whole images or feature maps is inefficient. To address this issue, we propose an efficient DM for IR (DiffIR), which consists of a compact IR prior extraction network (CPEN), dynamic IR transformer (DIRformer), and denoising network. Specifically, DiffIR has two training stages: pretraining and training DM. In pretraining, we input ground-truth images into CPEN$_{S1}$ to capture a compact IR prior representation (IPR) to guide DIRformer. In the second stage, we train the DM to directly estimate the same IRP as pretrained CPEN$_{S1}$ only using LQ images. We observe that since the IPR is only a compact vector, DiffIR can use fewer iterations than traditional DM to obtain accurate estimations and generate more stable and realistic results. Since the iterations are few, our DiffIR can adopt a joint optimization of CPEN$_{S2}$, DIRformer, and denoising network, which can further reduce the estimation error influence. We conduct extensive experiments on several IR tasks and achieve SOTA performance while consuming less computational costs.
翻译:扩散模型(DM)通过将图像合成过程建模为去噪网络的顺序应用,已经实现了SOTA的性能。然而,与生成每个像素的图像合成不同,大多数图像修复(IR)的像素是已知的。因此,对于IR,传统的DM在一个大模型上运行大量迭代来估计整个图像或特征图是低效的。为了解决这个问题,我们提出了一种针对IR高效的DM(DiffIR),它由一个紧凑的IR先验提取网络(CPEN)、动态IR转换器(DIRformer)和去噪网络组成。具体而言,DiffIR有两个训练阶段:预训练和DM训练。在预训练中,我们将地面真实图像输入CPEN$_{S1}$中,以捕获紧凑的IR先验表示(IPR)来指导DIRformer。在第二阶段中,我们训练DM,直接估计与预先训练的CPEN$_{S1}$相同的IRP,仅使用LQ图像。我们观察到,由于IPR只是一个紧凑的向量,DiffIR可以使用比传统DM更少的迭代次数来获取准确的估计,并生成更稳定和逼真的结果。由于迭代次数较少,我们的DiffIR可以采用CPEN$_{S2}$、DIRformer和去噪网络的联合优化,从而进一步减少估计误差影响。我们在几个IR任务上进行了广泛的实验,并在消耗更少的计算成本的情况下实现了SOTA性能。