Image-to-image translation is an important and challenging problem in computer vision. Existing approaches like Pixel2Pixel, DualGAN suffer from the instability of GAN and fail to generate diverse outputs because they model the task as a one-to-one mapping. Although diffusion models can generate images with high quality and diversity, current conditional diffusion models still can not maintain high similarity with the condition image on image-to-image translation tasks due to the Gaussian noise added in the reverse process. To address these issues, a novel Vector Quantized Brownian Bridge(VQBB) diffusion model is proposed in this paper. On one hand, Brownian Bridge diffusion process can model the transformation between two domains more accurate and flexible than the existing Markov diffusion methods. As far as the authors know, it is the first work for Brownian Bridge diffusion process proposed for image-to-image translation. On the other hand, the proposed method improved the learning efficiency and translation accuracy by confining the diffusion process in the quantized latent space. Finally, numerical experimental results validated the performance of the proposed method.
翻译:图像到图像翻译是计算机视觉中一个重要和具有挑战性的问题。 Pixel2Pixel、 DualGAN等现有方法因GAN不稳定而受害,并且由于将任务作为一对一的映射模型进行模拟而未能产生多种产出。虽然扩散模型可以产生高质量和多样性高的图像,但目前有条件的传播模型仍然无法保持与图像到图像翻译任务中图像到图像翻译任务的条件图像的高度相似性,原因是在反向过程中添加了高斯语的噪音。为了解决这些问题,本文件提出了一个新的矢量-量化布朗桥(VQBB)传播模型。一方面,布朗桥传播进程可以比现有的马尔科夫传播方法更准确和灵活地模拟两个领域之间的转换。据作者所知,这是布朗桥传播进程为图像到图像翻译提出的第一份工作。另一方面,拟议方法通过在量化的潜在空间中调整传播过程,提高了学习效率和翻译准确性。最后,数字实验结果验证了拟议方法的性能。