We present a novel method for exemplar-based image translation, called matching interleaved diffusion models (MIDMs). Most existing methods for this task were formulated as GAN-based matching-then-generation framework. However, in this framework, matching errors induced by the difficulty of semantic matching across cross-domain, e.g., sketch and photo, can be easily propagated to the generation step, which in turn leads to degenerated results. Motivated by the recent success of diffusion models overcoming the shortcomings of GANs, we incorporate the diffusion models to overcome these limitations. Specifically, we formulate a diffusion-based matching-and-generation framework that interleaves cross-domain matching and diffusion steps in the latent space by iteratively feeding the intermediate warp into the noising process and denoising it to generate a translated image. In addition, to improve the reliability of the diffusion process, we design a confidence-aware process using cycle-consistency to consider only confident regions during translation. Experimental results show that our MIDMs generate more plausible images than state-of-the-art methods.
翻译:我们提出了一种新颖的基于范例的图像翻译方法,称为匹配交错扩散模型(MIDMs)。大多数现有的这类方法被构建为基于GAN的匹配-生成框架。然而,在这个框架中,由于跨领域(例如,素描和照片)的语义匹配的难度,匹配误差很容易被传播到生成步骤,这反过来导致了退化的结果。受到扩散模型克服GAN缺点的最近成功启示,我们将扩散模型结合起来克服这些限制。具体而言,我们构建了一种基于扩散的匹配和生成框架,通过在潜空间中交替进行跨领域匹配和扩散步骤,通过将中间的变形输入到噪声处理中并对其进行去噪以生成翻译图像。此外,为了提高扩散过程的可靠性,我们设计了一个置信度感知的过程,使用循环一致性仅考虑在翻译过程中具有置信度的区域。实验结果表明,我们的MIDMs比现有的最新方法生成了更可信的图像。