Generative models, particularly GANs, have been utilized for image editing. Although GAN-based methods perform well on generating reasonable contents aligned with the user's intentions, they struggle to strictly preserve the contents outside the editing region. To address this issue, we use diffusion models instead of GANs and propose a novel image-editing method, based on pixel-wise guidance. Specifically, we first train pixel-classifiers with few annotated data and then estimate the semantic segmentation map of a target image. Users then manipulate the map to instruct how the image is to be edited. The diffusion model generates an edited image via guidance by pixel-wise classifiers, such that the resultant image aligns with the manipulated map. As the guidance is conducted pixel-wise, the proposed method can create reasonable contents in the editing region while preserving the contents outside this region. The experimental results validate the advantages of the proposed method both quantitatively and qualitatively.
翻译:生成模型,特别是 GANs 已经用于图像编辑。 虽然基于 GAN 的方法在产生符合用户意图的合理内容方面效果良好, 但是它们很难严格保存编辑区域之外的内容。 为了解决这个问题, 我们使用扩散模型而不是GANs, 并基于像素指导提出新的图像编辑方法。 具体地说, 我们首先用少量附加说明的数据来训练像素分类器, 然后估计目标图像的语义分割图 。 用户然后操作地图, 指示如何编辑图像。 扩散模型通过像素分类器的指导生成编辑图像, 从而使得生成的图像与被操纵的地图相一致。 由于指南是像素操作的, 拟议的方法可以在编辑区域中创建合理的内容, 同时保留此区域之外的内容。 实验结果验证了拟议方法在定量和定性方面的优势 。