We investigate the problem of zero-shot semantic image painting. Instead of painting modifications into an image using only concrete colors or a finite set of semantic concepts, we ask how to create semantic paint based on open full-text descriptions: our goal is to be able to point to a location in a synthesized image and apply an arbitrary new concept such as "rustic" or "opulent" or "happy dog." To do this, our method combines a state-of-the art generative model of realistic images with a state-of-the-art text-image semantic similarity network. We find that, to make large changes, it is important to use non-gradient methods to explore latent space, and it is important to relax the computations of the GAN to target changes to a specific region. We conduct user studies to compare our methods to several baselines.
翻译:我们探讨了零样本语义图像绘画的问题。我们不仅使用具体颜色或有限的语义概念来绘制图像的修改,还要求如何根据开放的全文描述来创建语义绘画:我们的目标是能够在合成图像中指向一个位置,并应用任意新概念,例如“乡村”,“奢华”或“快乐的狗”。为此,我们的方法将具有逼真图像的最先进生成模型与最先进的文本-图像语义相似性网络相结合。我们发现,为了进行大规模的更改,使用非梯度方法来探索潜空间是很重要的,并且重要的是放松GAN的计算以便将更改定位到特定区域。我们进行了用户研究,以将我们的方法与几种基线进行比较。