Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for "personalization" of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can be used to synthesize novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering, all while preserving the subject's key features. We also provide a new dataset and evaluation protocol for this new task of subject-driven generation. Project page: https://dreambooth.github.io/
翻译:大型文本到图像模型在AI的演进中取得了显著的飞跃,使得能够对特定文本的图像进行高质量和多样的合成。 但是, 这些模型缺乏在不同的背景中模仿特定参考集中主题外观和合成其新翻版的能力。 在这项工作中, 我们展示了文本到图像扩散模型的“ 个性化” 新方法。 作为一种主题的几张图像, 我们微调了一个预先训练的文本到图像模型, 以便它学会将一个独特的标识符与该特定主题捆绑在一起。 一旦该主题嵌入该模型的产出域, 独特的标识符就可以用于合成不同场景中该主题的新摄影现实图像。 通过利用该模型中以前嵌入的语义化, 以及一个新的自自动分类的先前保存损失, 我们的技术可以将该主题组合在不同的场景中, 提出、 观点和照明条件, 而不是出现在参考图像中。 我们将我们的技术应用到一些先前无法完成的任务, 包括主题的重新翻版、 文本- 指南- 图像- 和艺术生成的新主题, 提供这个新主题的组合和艺术任务 。 我们驱动的版本/ 将所有关键版本的版本的版本的合成和艺术任务 。</s>