Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, and classifier signals. Code is available at https://github.com/arpitbansal297/Universal-Guided-Diffusion.
翻译:典型的传播模型经过培训,可以接受某种特定形式的调节,最常见的文字,不经过再培训,就不能以其他模式为条件。在这项工作中,我们提议一种通用的指导算法,使传播模型能够通过任意的指导模式加以控制,而无需再培训任何特定用途的组件。我们表明,我们的算法成功地生成了高质量的图像,具有指导功能,包括分割、面部识别、物体探测和分类信号。代码可在https://github.com/arpitbansal297/Universal-Guided-Difulation查阅。