Diffusion models have shown great promise in text-guided image style transfer, but there is a trade-off between style transformation and content preservation due to their stochastic nature. Existing methods require computationally expensive fine-tuning of diffusion models or additional neural network. To address this, here we propose a zero-shot contrastive loss for diffusion models that doesn't require additional fine-tuning or auxiliary networks. By leveraging patch-wise contrastive loss between generated samples and original image embeddings in the pre-trained diffusion model, our method can generate images with the same semantic content as the source image in a zero-shot manner. Our approach outperforms existing methods while preserving content and requiring no additional training, not only for image style transfer but also for image-to-image translation and manipulation. Our experimental results validate the effectiveness of our proposed method.
翻译:扩散模型已显示出在文本引导下进行图像风格转移的巨大潜力,但它们的随机性质意味着在样式转换和内容保留之间需要权衡。现有方法需要对扩散模型进行计算昂贵的微调或使用额外的神经网络。为解决这个问题,本文提出了一种零样本对比损失的扩散模型,不需要额外的微调或辅助网络。通过在预训练的扩散模型中生成的样本和原始图像嵌入之间利用补丁级对比损失,我们的方法可以以零样本的方式生成与源图像具有相同语义内容的图像。我们的方法不仅适用于图像风格转移,还适用于图像与图像之间的转换以及图像的操作,并且在保留内容的同时,无需额外的培训。我们的实验结果验证了我们提出的方法的有效性。