混合冷流传播 (Blended Latent Diffusion)

The tremendous progress in neural image generation, coupled with the emergence of seemingly omnipotent vision-language models has finally enabled text-based interfaces for creating and editing images. Handling generic images requires a diverse underlying generative model, hence the latest works utilize diffusion models, which were shown to surpass GANs in terms of diversity. One major drawback of diffusion models, however, is their relatively slow inference time. In this paper, we present an accelerated solution to the task of local text-driven editing of generic images, where the desired edits are confined to a user-provided mask. Our solution leverages a recent text-to-image Latent Diffusion Model (LDM), which speeds up diffusion by operating in a lower-dimensional latent space. We first convert the LDM into a local image editor by incorporating Blended Diffusion into it. Next we propose an optimization-based solution for the inherent inability of this LDM to accurately reconstruct images. Finally, we address the scenario of performing local edits using thin masks. We evaluate our method against the available baselines both qualitatively and quantitatively and demonstrate that in addition to being faster, our method achieves better precision than the baselines while mitigating some of their artifacts. Project page is available at https://omriavrahami.com/blended-latent-diffusion-page/

翻译：神经图像生成的巨大进步,加上看似万能的视觉语言模型的出现,终于使得基于文本的界面能够创建和编辑图像。处理通用图像需要一种多样的基本基因模型,因此最新作品使用传播模型,这些模型在多样性方面显示超过GAN。然而,传播模型的一大缺点是其相对缓慢的推导时间。在本文中,我们提出了一个加速解决对通用图像进行本地文本驱动编辑的任务的解决方案,其中所希望的编辑仅限于一个用户提供的遮罩。我们的解决方案利用了最新的文本到模拟Lentnt Difmulation模型(LDDM),该模型通过在较低维度的潜在空间操作加快传播速度。我们首先将LDMD转换为本地图像编辑, 将Blended Difluction纳入其中。我们提出一个基于优化的解决方案,因为LDMDM固有的无法准确重建图像。最后,我们讨论了使用薄面具进行本地编辑的情景。我们根据现有基准评估了我们的方法, 质量和量化的方法, 并表明除了更快的减缓/ 外, 我们的方法在可获取的IMLA/ prival- pilation 基线上, 我们的方法比其精确性标值更好。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/