Language-guided image editing has achieved great success recently. In this paper, for the first time, we investigate exemplar-guided image editing for more precise control. We achieve this goal by leveraging self-supervised training to disentangle and re-organize the source image and the exemplar. However, the naive approach will cause obvious fusing artifacts. We carefully analyze it and propose an information bottleneck and strong augmentations to avoid the trivial solution of directly copying and pasting the exemplar image. Meanwhile, to ensure the controllability of the editing process, we design an arbitrary shape mask for the exemplar image and leverage the classifier-free guidance to increase the similarity to the exemplar image. The whole framework involves a single forward of the diffusion model without any iterative optimization. We demonstrate that our method achieves an impressive performance and enables controllable editing on in-the-wild images with high fidelity.
翻译:语言引导图像编辑最近取得了巨大成功 。 在本文中, 我们首次调查了 Exmplar 指导图像编辑, 以进行更精确的控制 。 我们通过利用自我监督的培训, 解开和重组源图像和示例。 然而, 天真的方法将引致明显的熔化文物。 我们仔细分析它, 并提议一个信息瓶颈和强大的增强, 以避免直接复制和粘贴示例图像的微小解决方案 。 同时, 为了确保编辑程序的可控性, 我们为示例图像设计了一个任意的形状遮罩, 并利用免分类器的指南来增加与示例图像的相似性 。 整个框架包含一个单一的扩展模型, 而不做任何互动优化 。 我们证明我们的方法取得了令人印象深刻的性能, 并且能够以高度忠贞洁的方式对本地图像进行可控的编辑 。