Image manipulation under the guidance of textual descriptions has recently received a broad range of attention. In this study, we focus on the regional editing of images with the guidance of given text prompts. Different from current mask-based image editing methods, we propose a novel region-aware diffusion model (RDM) for entity-level image editing, which could automatically locate the region of interest and replace it following given text prompts. To strike a balance between image fidelity and inference speed, we design the intensive diffusion pipeline by combing latent space diffusion and enhanced directional guidance. In addition, to preserve image content in non-edited regions, we introduce regional-aware entity editing to modify the region of interest and preserve the out-of-interest region. We validate the proposed RDM beyond the baseline methods through extensive qualitative and quantitative experiments. The results show that RDM outperforms the previous approaches in terms of visual quality, overall harmonization, non-editing region content preservation, and text-image semantic consistency. The codes are available at https://github.com/haha-lisa/RDM-Region-Aware-Diffusion-Model.
翻译:在文本描述的指导下,图像处理最近受到广泛的关注。在本研究中,我们注重在特定文本提示的指导下对图像进行区域编辑。与目前基于面具的图像编辑方法不同,我们提出一个新的区域觉悟扩散模型(RDM),供实体一级图像编辑使用,该模型可以自动定位感兴趣的区域,并在给定文本提示后取代它。为了在图像真实性和推断速度之间取得平衡,我们通过梳理潜在的空间传播和强化的方向性指导来设计密集的传播管道。此外,为了在非经编辑的区域保存图像内容,我们还采用区域觉醒实体编辑,以修改感兴趣的区域并维护利益外的区域。我们通过广泛的定性和定量实验,验证基线方法以外的拟议区域觉悟扩散模型。结果显示,RDM在视觉质量、总体协调、非编辑区域内容保存和文本模量性内容一致性方面,超越了以往的方法。代码见https://github.com/hah-lisa/RDM-Region-Aware-Dif-DARVAL-D-DOL。