Large-scale text-to-image generation models with an exponential evolution can currently synthesize high-resolution, feature-rich, high-quality images based on text guidance. However, they are often overwhelmed by words of new concepts, styles, or object entities that always emerge. Although there are some recent attempts to use fine-tuning or prompt-tuning methods to teach the model a new concept as a new pseudo-word from a given reference image set, these methods are not only still difficult to synthesize diverse and high-quality images without distortion and artifacts, but also suffer from low controllability. To address these problems, we propose a DreamArtist method that employs a learning strategy of contrastive prompt-tuning, which introduces both positive and negative embeddings as pseudo-words and trains them jointly. The positive embedding aggressively learns characteristics in the reference image to drive the model diversified generation, while the negative embedding introspects in a self-supervised manner to rectify the mistakes and inadequacies from positive embedding in reverse. It learns not only what is correct but also what should be avoided. Extensive experiments on image quality and diversity analysis, controllability analysis, model learning analysis and task expansion have demonstrated that our model learns not only concept but also form, content and context. Pseudo-words of DreamArtist have similar properties as true words to generate high-quality images.
翻译:大规模文本到图像生成模型,具有指数进化,目前可以合成基于文本指导的高分辨率、丰富特性、高质量图像,但往往被新概念、风格或对象实体的文字所淹没。虽然最近有人试图使用微调或快速调整方法,将模型的新概念作为新的假名从一个特定参考图像集中教授,但这些方法不仅仍然难以在不扭曲和工艺品的情况下合成多样化和高质量图像,而且还会受到低可控性的影响。为了解决这些问题,我们建议了一种梦幻艺术方法,采用对比性快速调适的学习战略,将正和负嵌入作为假字,并联合培训它们。积极嵌入或快速调整方法,在参考图像中积极学习各种特征,以驱动模型多样化的一代,而负面嵌入的内嵌式不仅难以在不扭曲和手工艺品的正面嵌入中纠正错误和缺陷。我们不仅知道什么是正确的,而且应该避免什么。关于图像质量和多样性快速调整的学习策略分析,也只是将图像质量和多样性的广度分析、控制性分析作为高额分析。