Non-autoregressive generative transformers recently demonstrated impressive image generation performance, and orders of magnitude faster sampling than their autoregressive counterparts. However, optimal parallel sampling from the true joint distribution of visual tokens remains an open challenge. In this paper we introduce Token-Critic, an auxiliary model to guide the sampling of a non-autoregressive generative transformer. Given a masked-and-reconstructed real image, the Token-Critic model is trained to distinguish which visual tokens belong to the original image and which were sampled by the generative transformer. During non-autoregressive iterative sampling, Token-Critic is used to select which tokens to accept and which to reject and resample. Coupled with Token-Critic, a state-of-the-art generative transformer significantly improves its performance, and outperforms recent diffusion models and GANs in terms of the trade-off between generated image quality and diversity, in the challenging class-conditional ImageNet generation.
翻译:无偏向基因变异器最近展示了令人印象深刻的图像生成性能,以及比其自动递减式变异器更快的级级取样。然而,从真实的视觉象征共同分布中进行的最佳平行取样仍然是一个公开的挑战。在本文中,我们引入了 Token-Critic, 这是一种辅助模型,用以指导非显性基因变异器的取样。鉴于一种蒙面和再造的真实图像, Token-Critic 模型经过培训,可以辨别哪些视觉象征属于原始图像,哪些是由基因变异器取样的。在非浮游迭代样取样中, Token-Critic 用于选择哪些象征接受、拒绝和重新标定。与Token-Cric, 一种先进的基因变异器相结合,大大提高了它的性能,在具有挑战性的分类条件的图像网络生成中,超越了近期的传播模型和GANs。