Vector-Quantized (VQ-based) generative models usually consist of two basic components, i.e., VQ tokenizers and generative transformers. Prior research focuses on improving the reconstruction fidelity of VQ tokenizers but rarely examines how the improvement in reconstruction affects the generation ability of generative transformers. In this paper, we surprisingly find that improving the reconstruction fidelity of VQ tokenizers does not necessarily improve the generation. Instead, learning to compress semantic features within VQ tokenizers significantly improves generative transformers' ability to capture textures and structures. We thus highlight two competing objectives of VQ tokenizers for image synthesis: semantic compression and details preservation. Different from previous work that only pursues better details preservation, we propose Semantic-Quantized GAN (SeQ-GAN) with two learning phases to balance the two objectives. In the first phase, we propose a semantic-enhanced perceptual loss for better semantic compression. In the second phase, we fix the encoder and codebook, but enhance and finetune the decoder to achieve better details preservation. The proposed SeQ-GAN greatly improves VQ-based generative models and surpasses the GAN and Diffusion Models on both unconditional and conditional image generation. Our SeQ-GAN (364M) achieves Frechet Inception Distance (FID) of 6.25 and Inception Score (IS) of 140.9 on 256x256 ImageNet generation, a remarkable improvement over VIT-VQGAN (714M), which obtains 11.2 FID and 97.2 IS.
翻译:先前的研究重点是改善VQ质地器的重建忠诚度,但很少研究重建的改善如何影响基因变异器的生成能力。在本论文中,我们惊讶地发现,改进VQ质地器的重建忠诚度并不一定能改善这一生成。相反,学习在VQ质地品中压缩语义特性,大大提高了Rickrick 质物和结构的基因化变异器的能力。因此,我们强调VQ质地变异器在图像合成方面的两个相互竞争的目标:语义压缩和细节保存。我们建议,与以往只追求更好细节保存的工程不同,Smantic-量化GAN(SeQ-GAN) 有两个学习阶段来平衡这两个目标。在第一阶段,我们提议在Sendricker-encial IMSeqral-Secial-Seqralal-M) 改进Seqral-ANDM(Seqeral-NQ-Seal-Seqeral-Q-Seal-Q-Seal-Seal-G-Q-Seal-Q-Seal-Seal-G-Q-Q-Seal-Q-Seal-Q-M-Seal-Q-Seal-Q-Q-Q-Q-Serv-Sild-SQ-S-Q-SQ-SQ-SQ-SQ-SQ-M-M-M-M-M-M-M-M-M-M-G-G-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-G-G-M-M-G-M-M-M-M-M-M-M-M-M-M-G-G-M-M-G-G-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-M-