使用向导传播模型编辑真实图像 (Null-text Inversion for Editing Real Images using Guided Diffusion Models)

Recent text-guided diffusion models provide powerful image generation capabilities. Currently, a massive effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. To edit a real image using these state-of-the-art tools, one must first invert the image with a meaningful text prompt into the pretrained model's domain. In this paper, we introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image. Our proposed inversion consists of two novel key components: (i) Pivotal inversion for diffusion models. While current methods aim at mapping random noise samples to a single input image, we use a single pivotal noise vector for each timestamp and optimize around it. We demonstrate that a direct inversion is inadequate on its own, but does provide a good anchor for our optimization. (ii) NULL-text optimization, where we only modify the unconditional textual embedding that is used for classifier-free guidance, rather than the input text embedding. This allows for keeping both the model weights and the conditional embedding intact and hence enables applying prompt-based editing while avoiding the cumbersome tuning of the model's weights. Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing, showing high-fidelity editing of real images.

翻译：最新文本制化的传播模型提供了强大的图像生成能力。目前, 正在做出巨大的努力, 使这些图像的修改能够仅以文本作为提供直观和多功能编辑的手段。要使用这些最先进的工具编辑真实图像, 首先必须将有意义的文本转换到预培训模型的域内。在此文件中, 我们引入了准确的反向技术, 从而方便了对图像进行直观的基于文本的修改。我们提议的反向包含两个新颖的关键组成部分:(一) 传播模型的动态反向。目前的方法旨在将随机噪音样本绘制成一个单一输入图像, 我们使用一个单一的关键噪声矢量矢量, 用于每个时间戳并优化周围的图像。我们证明直接反向图像本身不够, 但确实为我们优化提供了良好的定位。 (二) NULLLL- 文本优化, 我们只修改用于解析器免费指导的无条件文本嵌入, 而不是输入文本嵌入。这样可以将模型的重量和有条件的嵌入式图像标定成一个单一输入图像图像图像, 我们使用一个单一的枢轴, 并优化其周围的图像。我们的快速的快速的校正版, 将快速的校正的校订, 以显示我们基于的快速的高级的校正的校订。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日