We introduce Delta Denoising Score (DDS), a novel scoring function for text-based image editing that guides minimal modifications of an input image towards the content described in a target prompt. DDS leverages the rich generative prior of text-to-image diffusion models and can be used as a loss term in an optimization problem to steer an image towards a desired direction dictated by a text. DDS utilizes the Score Distillation Sampling (SDS) mechanism for the purpose of image editing. We show that using only SDS often produces non-detailed and blurry outputs due to noisy gradients. To address this issue, DDS uses a prompt that matches the input image to identify and remove undesired erroneous directions of SDS. Our key premise is that SDS should be zero when calculated on pairs of matched prompts and images, meaning that if the score is non-zero, its gradients can be attributed to the erroneous component of SDS. Our analysis demonstrates the competence of DDS for text based image-to-image translation. We further show that DDS can be used to train an effective zero-shot image translation model. Experimental results indicate that DDS outperforms existing methods in terms of stability and quality, highlighting its potential for real-world applications in text-based image editing.
翻译:我们引入了Delta去噪分数(DDS),一种新颖的用于基于文本的图像编辑的评分函数,它可以将输入图像的最小修改指向目标提示所描述的内容。DDS利用文本到图像扩散模型的丰富生成先验知识,并可以作为优化问题中的损失项,将图像引导到受文本控制的所需方向。DDS利用得分蒸馏采样(SDS)机制进行图像编辑。我们发现,仅使用SDS通常会产生因噪声渐变而导致的缺乏细节和模糊的输出。为了解决这个问题,DDS使用匹配输入图像的提示标识并删除SDS的不受欢迎的错误方向。我们的关键前提是,当基于匹配的提示和图像计算得分时,SDS应该为零,如果分数不为零,则其梯度可以归因于SDS的错误部分。我们的分析证明了DDS在基于文本的图像到图像转换方面的竞争力。我们进一步展示了DDS可以用于训练一种有效的零样本图像转换模型。实验结果表明,DDS在稳定性和质量方面优于现有方法,突出了其在基于文本的图像编辑实际应用中的潜力。