Text-guided image inpainting endeavors to generate new content within specified regions of images using textual prompts from users. The primary challenge is to accurately align the inpainted areas with the user-provided prompts while maintaining a high degree of visual fidelity. While existing inpainting methods have produced visually convincing results by leveraging the pre-trained text-to-image diffusion models, they still struggle to uphold both prompt alignment and visual rationality simultaneously. In this work, we introduce FreeInpaint, a plug-and-play tuning-free approach that directly optimizes the diffusion latents on the fly during inference to improve the faithfulness of the generated images. Technically, we introduce a prior-guided noise optimization method that steers model attention towards valid inpainting regions by optimizing the initial noise. Furthermore, we meticulously design a composite guidance objective tailored specifically for the inpainting task. This objective efficiently directs the denoising process, enhancing prompt alignment and visual rationality by optimizing intermediate latents at each step. Through extensive experiments involving various inpainting diffusion models and evaluation metrics, we demonstrate the effectiveness and robustness of our proposed FreeInpaint.
翻译:文本引导的图像修复旨在利用用户提供的文本提示,在图像的指定区域内生成新内容。其主要挑战在于准确对齐修复区域与用户提供的提示,同时保持高度的视觉保真度。尽管现有的修复方法通过利用预训练的文本到图像扩散模型已能生成视觉上可信的结果,但它们仍难以同时兼顾提示对齐与视觉合理性。本文中,我们提出了FreeInpaint,一种即插即用的免调优方法,该方法在推理过程中直接对扩散隐变量进行实时优化,以提升生成图像的忠实度。技术上,我们提出了一种先验引导的噪声优化方法,通过优化初始噪声来引导模型注意力聚焦于有效的修复区域。此外,我们精心设计了一个专为修复任务定制的复合引导目标。该目标通过优化每一步的中间隐变量,高效地引导去噪过程,从而增强提示对齐与视觉合理性。通过对多种修复扩散模型和评估指标进行大量实验,我们验证了所提出的FreeInpaint方法的有效性与鲁棒性。