DreamArtist：通过正负样本提示调整实现可控的一次文本生成图像 (DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Positive-Negative Prompt-Tuning)

Large-scale text-to-image generation models have achieved remarkable progress in synthesizing high-quality, feature-rich images with high resolution guided by texts. However, these models often struggle with novel concepts, eg, new styles, object entities, etc. Although recent attempts have employed fine-tuning or prompt-tuning strategies to teach the pre-trained diffusion model novel concepts from a reference image set,they have the drawback of overfitting to the given reference images, particularly in one-shot applications, which is harmful to generate diverse and high-quality images while maintaining generation controllability. To tackle this challenge, we present a simple yet effective method called DreamArtist, which employs a positive-negative prompt-tuning learning strategy. Specifically, DreamArtist incorporates both positive and negative embeddings and jointly trains them. The positive embedding aggressively captures the salient characteristics of the reference image to drive diversified generation and the negative embedding rectifies inadequacies from the positive embedding. It learns not only what is correct, but also what can be avoided or improved. We have conducted extensive experiments and evaluated the proposed method from image similarity and diversity, generation controllability, and style cloning. And our DreamArtist has achieved a superior generation performance over existing methods. Besides, our additional evaluation on extended tasks, including concept compositions and prompt-guided image editing, demonstrates its effectiveness for more applications.

翻译：大规模的文本生成图像模型在遵循文本指导下合成高分辨率、特征丰富的高质量图像方面取得了显著进展。然而，这些模型通常难以应对新颖的概念，例如新的风格、对象实体等。尽管最近的尝试采用了微调或提示调整策略，从参考图像集中向预训练扩散模型教授新概念，但它们存在这样的缺点，即在一次性应用中过度拟合给定的参考图像，这对于保持生成控制性和生成多样性的高质量图像是不利的。为了应对这一挑战，我们提出了一种简单而有效的方法，名为 DreamArtist，采用了正负样本提示调整学习策略。具体而言，DreamArtist 同时包含正负嵌入，并对它们进行联合训练。正嵌入积极捕捉参考图像的显著特征，以推动多样化生成；而负嵌入则纠正正嵌入中的不足。它不仅学习正确的内容，还学习了应该避免或改进的内容。我们进行了大量实验，并从图像相似性和多样性，生成控制性以及风格克隆等方面评估了所提出的方法。我们的 DreamArtist 在生成性能方面优于现有方法。此外，我们在包括概念组合和提示引导图像编辑在内的扩展任务上的额外评估表明，它对更多应用具有显著的效果。