检索增强扩散模型 (Retrieval-Augmented Diffusion Models)

Generative image synthesis with diffusion models has recently achieved excellent visual quality in several tasks such as text-based or class-conditional image synthesis. Much of this success is due to a dramatic increase in the computational capacity invested in training these models. This work presents an alternative approach: inspired by its successful application in natural language processing, we propose to complement the diffusion model with a retrieval-based approach and to introduce an explicit memory in the form of an external database. During training, our diffusion model is trained with similar visual features retrieved via CLIP and from the neighborhood of each training instance. By leveraging CLIP's joint image-text embedding space, our model achieves highly competitive performance on tasks for which it has not been explicitly trained, such as class-conditional or text-image synthesis, and can be conditioned on both text and image embeddings. Moreover, we can apply our approach to unconditional generation, where it achieves state-of-the-art performance. Our approach incurs low computational and memory overheads and is easy to implement. We discuss its relationship to concurrent work and will publish code and pretrained models soon.

翻译：与传播模型的生成图像合成最近在若干任务(如基于文本的图像合成)中取得了极佳的视觉质量。这一成功在很大程度上是由于在培训这些模型时投入的计算能力急剧增加。这项工作提出了一种替代方法:由于在自然语言处理中成功应用了该模型,我们提议以基于检索的方法作为传播模型的补充,并以外部数据库的形式引入明确的记忆。在培训期间,我们的传播模型通过通过CLIP和每个培训实例的周边区域检索到类似的视觉特征进行了培训。通过利用CLIP的图像文本联合嵌入空间,我们的模型在未经明确培训的任务上取得了高度竞争性的业绩,例如课堂条件合成或文本图像合成,并且可以同时以文本和图像嵌入为条件。此外,我们可以应用我们的方法无条件生成,在其中取得最先进的业绩。我们的方法是低的计算和记忆间接成本,并且容易实施。我们讨论它与同时工作的关系,并将很快公布代码和预设模型。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/