Diffusion models have shown great promise in text-guided image style transfer, but there is a trade-off between style transformation and content preservation due to their stochastic nature. Existing methods require computationally expensive fine-tuning of diffusion models or additional neural network. To address this, here we propose a zero-shot contrastive loss for diffusion models that doesn't require additional fine-tuning or auxiliary networks. By leveraging patch-wise contrastive loss between generated samples and original image embeddings in the pre-trained diffusion model, our method can generate images with the same semantic content as the source image in a zero-shot manner. Our approach outperforms existing methods while preserving content and requiring no additional training, not only for image style transfer but also for image-to-image translation and manipulation. Our experimental results validate the effectiveness of our proposed method.
翻译:传播模型在文本制版图像风格传输中显示了巨大的希望, 但是由于风格转换和内容保存之间的平衡是因为它们的随机性。 现有的方法要求对传播模型或额外的神经网络进行成本高昂的计算性微调。 为了解决这个问题, 我们在这里建议对不需要额外微调或辅助网络的传播模型进行零点效果对比性损失。 通过利用生成的样本和原始图像嵌入预培训的传播模型之间的交错式差异性损失, 我们的方法可以以零发方式生成与源图像相同的语义内容图像。 我们的方法在保存内容的同时超越现有方法,不需要额外的培训,不仅用于图像样式的传输,而且用于图像到图像的翻译和操作。 我们的实验结果验证了我们拟议方法的有效性 。</s>