Can a text-to-image diffusion model be used as a training objective for adapting a GAN generator to another domain? In this paper, we show that the classifier-free guidance can be leveraged as a critic and enable generators to distill knowledge from large-scale text-to-image diffusion models. Generators can be efficiently shifted into new domains indicated by text prompts without access to groundtruth samples from target domains. We demonstrate the effectiveness and controllability of our method through extensive experiments. Although not trained to minimize CLIP loss, our model achieves equally high CLIP scores and significantly lower FID than prior work on short prompts, and outperforms the baseline qualitatively and quantitatively on long and complicated prompts. To our best knowledge, the proposed method is the first attempt at incorporating large-scale pre-trained diffusion models and distillation sampling for text-driven image generator domain adaptation and gives a quality previously beyond possible. Moreover, we extend our work to 3D-aware style-based generators and DreamBooth guidance.
翻译:文本到图像扩散模型能否被用作使 GAN 生成器适应另一个域的培训目标? 在本文中,我们显示,不用分类的指南可以作为批评者加以利用,使生成者能够从大规模文本到图像的传播模型中提取知识。 发电机可以高效地转换到用文本提示显示的新领域,而不能从目标域的地面图象样本中获取信息。 我们通过广泛的实验来展示我们的方法的有效性和可控制性。 我们虽然没有受过如何尽量减少 CLIP 损失的培训,但我们的模型的CLIP 得分和FID都大大低于以前关于短提示的工作,在质量和数量上超过了长、复杂提示的基线。 据我们所知,拟议的方法是首次尝试将经过预先训练的大规模扩散模型和蒸馏取样纳入文本驱动图像生成域的适应工作,并提供了此前无法达到的质量。 此外,我们把我们的工作扩大到3D-觉识的风格生成器和DreamBooth 指导。