Recently, CLIP-guided image synthesis has shown appealing performance on adapting a pre-trained source-domain generator to an unseen target domain. It does not require any target-domain samples but only the textual domain labels. The training is highly efficient, e.g., a few minutes. However, existing methods still have some limitations in the quality of generated images and may suffer from the mode collapse issue. A key reason is that a fixed adaptation direction is applied for all cross-domain image pairs, which leads to identical supervision signals. To address this issue, we propose an Image-specific Prompt Learning (IPL) method, which learns specific prompt vectors for each source-domain image. This produces a more precise adaptation direction for every cross-domain image pair, endowing the target-domain generator with greatly enhanced flexibility. Qualitative and quantitative evaluations on various domains demonstrate that IPL effectively improves the quality and diversity of synthesized images and alleviates the mode collapse. Moreover, IPL is independent of the structure of the generative model, such as generative adversarial networks or diffusion models. Code is available at https://github.com/Picsart-AI-Research/IPL-Zero-Shot-Generative-Model-Adaptation.
翻译:最近,CLIP引导的图像合成表现出合适的性能,在适应预训练的源域生成器到一个看不见的目标域上。它不需要任何目标域样本,只需要文本域标签。训练非常高效,例如几分钟。然而,现有的方法在所生成图像的质量方面仍然存在一些限制,并且可能会遭受模式崩溃的问题。一个关键原因是,尽管跨域图像对可能非常不同,但所有跨域图像对都会应用固定的适应方向,从而导致相同的监督信号。为了解决这个问题,我们提出了一种基于图像特定提示学习(IPL)的方法,用于为每个源域图像学习特定的提示向量。这为每个跨域图像对提供了更精确的适应方向,赋予了目标域生成器更大的灵活性。各种域上的定性和定量评估表明,IPL有效提高了合成图像的质量和多样性,并减轻了模式崩溃的情况。此外,IPL独立于生成模型的结构,例如生成对抗网络或扩散模型。代码可在 https://github.com/Picsart-AI-Research/IPL-Zero-Shot-Generative-Model-Adaptation 上获取。