We present a novel method for exemplar-based image translation, called matching interleaved diffusion models (MIDMs). Most existing methods for this task were formulated as GAN-based matching-then-generation framework. However, in this framework, matching errors induced by the difficulty of semantic matching across cross-domain, e.g., sketch and photo, can be easily propagated to the generation step, which in turn leads to degenerated results. Motivated by the recent success of diffusion models overcoming the shortcomings of GANs, we incorporate the diffusion models to overcome these limitations. Specifically, we formulate a diffusion-based matching-and-generation framework that interleaves cross-domain matching and diffusion steps in the latent space by iteratively feeding the intermediate warp into the noising process and denoising it to generate a translated image. In addition, to improve the reliability of the diffusion process, we design a confidence-aware process using cycle-consistency to consider only confident regions during translation. Experimental results show that our MIDMs generate more plausible images than state-of-the-art methods.
翻译:我们提出了一种创新的方法,用于模拟图像翻译,称为匹配脱节扩散模型(MIDMs ) 。这一任务的大多数现有方法都是作为基于GAN的匹配时代框架制定的。然而,在这个框架中,由于难以在跨界域间进行语义匹配(例如草图和照片),因此匹配错误很容易传播到生成阶段,这反过来又会导致结果的退化。我们受最近成功通过传播模型克服GANs缺陷的成功激励,我们采用了传播模型来克服这些局限性。具体地说,我们制定了一个基于扩散的匹配和生成框架,通过将中间波纹反复输入到节点进程并分解来生成一个翻译图像,在潜在空间内跨界匹配和传播步骤。此外,为了提高传播过程的可靠性,我们设计了一个有信心的进程,利用周期一致性来仅在翻译过程中考虑有自信的区域。实验结果显示,我们的MIDs产生的图像比状态方法更可信。