Recent text-to-image generation models like DreamBooth have made remarkable progress in generating highly customized images of a target subject, by fine-tuning an ``expert model'' for a given subject from a few examples. However, this process is expensive, since a new expert model must be learned for each subject. In this paper, we present SuTI, a Subject-driven Text-to-Image generator that replaces subject-specific fine tuning with \emph{in-context} learning. Given a few demonstrations of a new subject, SuTI can instantly generate novel renditions of the subject in different scenes, without any subject-specific optimization. SuTI is powered by {\em apprenticeship learning}, where a single apprentice model is learned from data generated by massive amount of subject-specific expert models. Specifically, we mine millions of image clusters from the Internet, each centered around a specific visual subject. We adopt these clusters to train massive amount of expert models specialized on different subjects. The apprentice model SuTI then learns to mimic the behavior of these experts through the proposed apprenticeship learning algorithm. SuTI can generate high-quality and customized subject-specific images 20x faster than optimization-based SoTA methods. On the challenging DreamBench and DreamBench-v2, our human evaluation shows that SuTI can significantly outperform existing approaches like InstructPix2Pix, Textual Inversion, Imagic, Prompt2Prompt, Re-Imagen while performing on par with DreamBooth.
翻译:最近的文本到图像生成模型(例如DreamBooth)在生成目标主题高度定制的图像方面取得了显着的进展,通过从少量示例中微调“专家模型”来为给定的主题。但是,这个过程是昂贵的,因为必须为每个主题学习新的专家模型。在本文中,我们提出了SuTI,一种主题驱动的文本到图像生成器,它用“上下文学习”替代了特定于主题的微调。给定新主题的少数演示,SuTI 可以立即在不进行任何特定于主题的优化的情况下在不同场景中生成主题的新版本。 SuTI 由“学徒学习”驱动,通过由大量特定于主题的专家模型生成的数据来学习单一的学徒模型。具体地,我们从互联网中挖掘了数百万个图像聚类,每个聚类都以特定的视觉主题为中心。我们采用这些聚类来训练用于不同主题的大量专家模型。学徒模型 SuTI 然后通过提出的学徒学习算法,学习模仿这些专家的行为。SuTI可以比SoTA方法的基于优化的方法快20倍生成高质量可定制的主题特定的图像。 在具有挑战性的DreamBench和DreamBench-v2上,我们的人类评估表明,SuTI可以显着优于现有方法(如InstructPix2Pix、Textual Inversion、Imagic、Prompt2Prompt、Re-Imagen),并表现不输于DreamBooth。