Finding an initial noise vector that produces an input image when fed into the diffusion process (known as inversion) is an important problem in denoising diffusion models (DDMs), with applications for real image editing. The state-of-the-art approach for real image editing with inversion uses denoising diffusion implicit models (DDIMs) to deterministically noise the image to the intermediate state along the path that the denoising would follow given the original conditioning. However, DDIM inversion for real images is unstable as it relies on local linearization assumptions, which result in the propagation of errors, leading to incorrect image reconstruction and loss of content. To alleviate these problems, we propose Exact Diffusion Inversion via Coupled Transformations (EDICT), an inversion method that draws inspiration from affine coupling layers. EDICT enables mathematically exact inversion of real and model-generated images by maintaining two coupled noise vectors which are used to invert each other in an alternating fashion. Using Stable Diffusion, a state-of-the-art latent diffusion model, we demonstrate that EDICT successfully reconstructs real images with high fidelity. On complex image datasets like MS-COCO, EDICT reconstruction significantly outperforms DDIM, improving the mean square error of reconstruction by a factor of two. Using noise vectors inverted from real images, EDICT enables a wide range of image edits--from local and global semantic edits to image stylization--while maintaining fidelity to the original image structure. EDICT requires no model training/finetuning, prompt tuning, or extra data and can be combined with any pretrained DDM. Code is available at https://github.com/salesforce/EDICT.
翻译:初始噪声矢量在输入扩散进程( 被称为反向) 时会生成输入图像的初始噪声矢量, 是在使用真实图像编辑应用程序去掉传播模型( DDMS) 中的一个重要问题。 使用不转的真图像编辑真实图像编辑的最先进的版本方法使用不转的传播隐含模型( DDIMs) 来将图像按原始调节所遵循的路径向中间状态发出确定性噪声。 然而, DDIM 转换真实图像时不稳定,因为它依赖于本地线性假设,导致错误传播,导致图像重建错误和内容丢失。 为了缓解这些问题,我们建议通过合并变换( EDICTs) 进行真实图像转换的最先进的艺术方法。 EDICT能够通过保持两个同时循环使用的混合的噪音矢量矢量矢量流, 州- 维持隐蔽的图像传播模型, 我们证明, EDICFIF 成功地将真实图像与高真实性变异性图像的重建相, 使 MISDF 以高易变的图像升级。