零光图像到图像翻译 (Zero-shot Image-to-Image Translation)

Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse and high-quality images. However, it is still challenging to directly apply these models for editing real images for two reasons. First, it is hard for users to come up with a perfect text prompt that accurately describes every visual detail in the input image. Second, while existing models can introduce desirable changes in certain regions, they often dramatically alter the input content and introduce unexpected changes in unwanted regions. In this work, we propose pix2pix-zero, an image-to-image translation method that can preserve the content of the original image without manual prompting. We first automatically discover editing directions that reflect desired edits in the text embedding space. To preserve the general content structure after editing, we further propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. In addition, our method does not need additional training for these edits and can directly use the existing pre-trained text-to-image diffusion model. We conduct extensive experiments and show that our method outperforms existing and concurrent works for both real and synthetic image editing.

翻译：大规模文本到图像的基因化模型显示,它们具有不同和高质量图像的惊人综合能力。然而,直接应用这些模型来编辑真实图像仍具有挑战性,原因有两个。首先,用户很难找到一个完美的文本提示,准确描述输入图像中的每一个视觉细节。第二,虽然现有的模型可以在某些区域引入可取的变化,但它们往往会大大改变输入内容,并在不想要的区域引入出乎意料的变化。在这项工作中,我们提议了象素-零,一种图像到图像的翻译方法,可以保存原始图像的内容,而无需手动提示。我们首先自动发现编辑方向,在文本嵌入空间中反映所期望的编辑。为了在编辑后保存一般内容结构,我们进一步提出交叉注意指南,目的是在整个传播过程中保留输入图像的交叉注意图。此外,我们的方法不需要对这些编辑进行额外培训,可以直接使用现有的经事先培训的文本到图像传播模型。我们进行了广泛的实验,并显示我们的方法在实际和合成图像编辑方面都比现有的和并行工作要好。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日