We consider the targeted image editing problem: blending a region in a source image with a driver image that specifies the desired change. Differently from prior works, we solve this problem by learning a conditional probability distribution of the edits, end-to-end. Training such a model requires addressing a fundamental technical challenge: the lack of example edits for training. To this end, we propose a self-supervised approach that simulates edits by augmenting off-the-shelf images in a target domain. The benefits are remarkable: implemented as a state-of-the-art auto-regressive transformer, our approach is simple, sidesteps difficulties with previous methods based on GAN-like priors, obtains significantly better edits, and is efficient. Furthermore, we show that different blending effects can be learned by an intuitive control of the augmentation process, with no other changes required to the model architecture. We demonstrate the superiority of this approach across several datasets in extensive quantitative and qualitative experiments, including human studies, significantly outperforming prior work.
翻译:我们考虑了有针对性的图像编辑问题: 将一个区域与源图像混在一起, 驱动图像可以指定想要的改变。 与先前的工程不同, 我们通过学习编辑、 端到端的有条件概率分布来解决这个问题。 培训这样的模型需要解决一个根本性的技术挑战: 缺乏用于培训的范例编辑。 为此, 我们提议了一种自我监督的方法, 通过在目标域内增加现成图像来模拟编辑。 其好处是显著的: 我们的方法很简单, 与以前基于 GAN 类前科的旧方法相冲突, 获得大大改进的编辑, 并且效率很高。 此外, 我们显示, 可以通过对增强过程的直观控制来学习不同的混合效应, 而不需要对模型结构作其他修改。 我们通过广泛的定量和定性实验, 包括人类研究, 显著超过先前的工作, 展示了这个方法在多个数据集中的优势。