Vector quantized diffusion (VQ-Diffusion) is a powerful generative model for text-to-image synthesis, but sometimes can still generate low-quality samples or weakly correlated images with text input. We find these issues are mainly due to the flawed sampling strategy. In this paper, we propose two important techniques to further improve the sample quality of VQ-Diffusion. 1) We explore classifier-free guidance sampling for discrete denoising diffusion model and propose a more general and effective implementation of classifier-free guidance. 2) We present a high-quality inference strategy to alleviate the joint distribution issue in VQ-Diffusion. Finally, we conduct experiments on various datasets to validate their effectiveness and show that the improved VQ-Diffusion suppresses the vanilla version by large margins. We achieve an 8.44 FID score on MSCOCO, surpassing VQ-Diffusion by 5.42 FID score. When trained on ImageNet, we dramatically improve the FID score from 11.89 to 4.83, demonstrating the superiority of our proposed techniques.
翻译:矢量定量扩散(VQ-Difmission)是一种强大的文本到图像合成(VQ-Difmission)的遗传模型,但有时仍能产生低质量的样本或与文本输入有关的微弱图像。我们发现,这些问题主要是由于有缺陷的抽样战略造成的。我们在本文件中提出了进一步提高VQ-Difmission样本质量的两种重要技术。1)我们探讨为离散的分解扩散模型进行无分类指导抽样,并提议更全面和有效地实施无分类的指南。2)我们提出了一个高质量的推论战略,以缓解VQ-Difmission的联合发行问题。最后,我们就各种数据集进行了实验,以验证其有效性,并表明改进的VQ-Difmissmission-dific 将香草版本的利润大幅抑制。我们取得了关于MSCO的8.44国际开发公司分,比VQ-Difmission化得5.42国际化分。我们在接受图像网络培训时,将FID的分数从11.89提高到4.83,显示我们拟议技术的优势。