精细长成型文字到成象一代的渐进式否认模式 (Progressive Denoising Model for Fine-Grained Text-to-Image Generation)

Recently, vector quantized autoregressive (VQ-AR) models have shown remarkable results in text-to-image synthesis by equally predicting discrete image tokens from the top left to bottom right in the latent space. Although the simple generative process surprisingly works well, is this the best way to generate the image? For instance, human creation is more inclined to the outline-to-fine of an image, while VQ-AR models themselves do not consider any relative importance of each component. In this paper, we present a progressive denoising model for high-fidelity text-to-image image generation. The proposed method takes effect by creating new image tokens from coarse to fine based on the existing context in a parallel manner and this procedure is recursively applied until an image sequence is completed. The resulting coarse-to-fine hierarchy makes the image generation process intuitive and interpretable. Extensive experiments demonstrate that the progressive model produces significantly better results when compared with the previous VQ-AR method in FID score across a wide variety of categories and aspects. Moreover, the text-to-image generation time of traditional AR increases linearly with the output image resolution and hence is quite time-consuming even for normal-size images. In contrast, our approach allows achieving a better trade-off between generation quality and speed.

翻译：最近,矢量量化的自动递减模型(VQ-AR)在文本到图像合成中显示了显著的结果,在潜层空间中,通过同样预测左上至右下离散图像符号,在文本到图像合成中显示了显著的结果。虽然简单的基因化过程效果令人惊讶,但这是生成图像的最佳方法吗?例如,人类创造更倾向于图像的轮廓到线条,而VQ-AR模型本身并不认为每个组成部分具有相对重要性。在本文中,我们为高不端文本到图像生成提供了一个渐进的分辨模型。拟议方法的效果是,在现有背景下以平行的方式从粗微到细地创建新的图像符号,而这一程序在图像序列完成之前是循环应用的。由此形成的粗微到平整的等级使得图像生成过程不易懂和可解释。广泛的实验表明,进步模型与前的VQ-AR方法相比,在各种类别和方面都取得了显著的更好效果。此外,文本到图像生成的温度比重到精细的图像在常规的图像生成之间可以实现更好的水平。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日