拆卸变形器:从矢量量化代码生成快速高分辨率图像的分解扩散的平行当量预测 (Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes)

2021 年 11 月 24 日

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

翻译：拆卸变形器:从矢量量化代码生成快速高分辨率图像的分解扩散的平行当量预测

Sam Bond-Taylor,Peter Hessey,Hiroshi Sasaki,Toby P. Breckon,Chris G. Willcocks

from arxiv, 19 pages, 14 figures

Whilst diffusion probabilistic models can generate high quality image content, key limitations remain in terms of both generating high-resolution imagery and their associated high computational requirements. Recent Vector-Quantized image models have overcome this limitation of image resolution but are prohibitively slow and unidirectional as they generate tokens via element-wise autoregressive sampling from the prior. By contrast, in this paper we propose a novel discrete diffusion probabilistic model prior which enables parallel prediction of Vector-Quantized tokens by using an unconstrained Transformer architecture as the backbone. During training, tokens are randomly masked in an order-agnostic manner and the Transformer learns to predict the original tokens. This parallelism of Vector-Quantized token prediction in turn facilitates unconditional generation of globally consistent high-resolution and diverse imagery at a fraction of the computational expense. In this manner, we can generate image resolutions exceeding that of the original training set samples whilst additionally provisioning per-image likelihood estimates (in a departure from generative adversarial approaches). Our approach achieves state-of-the-art results in terms of Density (LSUN Bedroom: 1.51; LSUN Churches: 1.12; FFHQ: 1.20) and Coverage (LSUN Bedroom: 0.83; LSUN Churches: 0.73; FFHQ: 0.80), and performs competitively on FID (LSUN Bedroom: 3.64; LSUN Churches: 4.07; FFHQ: 6.11) whilst offering advantages in terms of both computation and reduced training set requirements.

翻译：虽然扩散概率模型可以产生高质量的图像内容,但在生成高分辨率图像及其相关的高计算要求方面,关键限制仍然存在。最近的矢量定量图像模型克服了图像分辨率的这一局限性,但速度缓慢且单向性令人望而却步,因为它们通过元素自自动递减抽样从先前的图像中产生象征物。相比之下,我们在本文件中提出了一个新的离散扩散概率模型,在使用未受限制的变压器结构作为主干线的情况下,可以平行预测矢量-定量符号。在培训期间,标牌以顺序-通感方式随机遮盖,而变压器则学会学会学会预测原始象征物。这种矢量定量象征性预测反过来又有助于无条件生成全球一致的高分辨率和不同图像,其成本为计算成本的一小部分。这样,我们就能生成超出原始培训标本样本的图像分辨率,同时额外提供每个图像的概率估计(偏离了基因化的对立式对抗法方法 ) 。在培训过程中,标标牌以顺序随机遮掩掩体的标本:联合国1-51-SLFF的计算结果:B-S:Broom 10;B-S:B-S:B-S-S-Seral Q的运行的运行:Broom:B-S-S-S-S:B-S-S-S-S-S-S-S-S-S-S-LS-LS-LS-LS-LS-LS-LS-LS-LS-LS-LS-LS-LS-LS-LS-LS-LS-LS-LS-LS-LS-LS-0-LS-0-L-LS-0 的运行的运行的运行的计算结果。