Diffusion models are recent generative models that have shown great success in image generation with the state-of-the-art performance. However, only a few researches have been conducted for image manipulation with diffusion models. Here, we present a novel DiffusionCLIP which performs text-driven image manipulation with diffusion models using Contrastive Language-Image Pre-training (CLIP) loss. Our method has a performance comparable to that of the modern GAN-based image processing methods for in and out-of-domain image processing tasks, with the advantage of almost perfect inversion even without additional encoders or optimization. Furthermore, our method can be easily used for various novel applications, enabling image translation from an unseen domain to another unseen domain or stroke-conditioned image generation in an unseen domain, etc. Finally, we present a novel multiple attribute control with DiffusionCLIPby combining multiple fine-tuned diffusion models.
翻译:传播模型是最近以最先进的性能在图像生成方面表现出巨大成功的基因化模型,然而,只有少数研究是用扩散模型对图像进行操纵的。在这里,我们展示了一种新型的Diful CLIP, 使用对比语言图像培训前(CLIP)损失的传播模型对文本驱动图像进行操纵。我们的方法与现代GAN的图像处理方法的性能相当,用于内外图像处理任务,其优势是几乎完全的转换,即使没有额外的编码器或优化。此外,我们的方法可以很容易地用于各种新的应用程序,使图像从一个隐蔽的域转换到另一个隐蔽域,或者在一个隐蔽域生成中受中风限制的图像。最后,我们展示了一种新型的多重属性控制,DifncLIP结合了多个经过精细调整的传播模型。