DiVAE: 与Disoising 扩散解码器合成的摄影现实图像 (DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder)

Recently most successful image synthesis models are multi stage process to combine the advantages of different methods, which always includes a VAE-like model for faithfully reconstructing embedding to image and a prior model to generate image embedding. At the same time, diffusion models have shown be capacity to generate high-quality synthetic images. Our work proposes a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis. We explore how to input image embedding into diffusion model for excellent performance and find that simple modification on diffusion's UNet can achieve it. Training on ImageNet, Our model achieves state-of-the-art results and generates more photorealistic images specifically. In addition, we apply the DiVAE with an Auto-regressive generator on conditional synthesis tasks to perform more human-feeling and detailed samples.

翻译：最近最成功的图像合成模型是将不同方法的优势结合起来的多阶段过程,这些方法总是包括忠实重建图像嵌入的VAE式模型,以及生成图像嵌入的先前模型。与此同时,扩散模型已经证明是生成高质量合成图像的能力。我们的工作提出了VQ-VAE结构模型,其中含有一个扩散解码器(DIVAE),作为图像合成的重建部分。我们探索如何将图像嵌入图像嵌入传播模型,以取得优异的性能,并发现对传播的UNet的简单修改能够实现这一点。关于图像网络的培训,我们的模式取得了最新的结果,并具体产生了更多的光现实图像。此外,我们用一个自动反向生成器将DVAE应用于有条件的合成任务,以进行更多的人感化和详细的样本。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日