TERSGAN:两个变形器能够制造一个强大的GAN (TransGAN: Two Transformers Can Make One Strong GAN)

The recent explosive interest on transformers has suggested their potential to become powerful "universal" models for computer vision tasks, such as classification, detection, and segmentation. However, how further transformers can go - are they ready to take some more notoriously difficult vision tasks, e.g., generative adversarial networks (GANs)? Driven by that curiosity, we conduct the first pilot study in building a GAN \textbf{completely free of convolutions}, using only pure transformer-based architectures. Our vanilla GAN architecture, dubbed \textbf{TransGAN}, consists of a memory-friendly transformer-based generator that progressively increases feature resolution while decreasing embedding dimension, and a patch-level discriminator that is also transformer-based. We then demonstrate TransGAN to notably benefit from data augmentations (more than standard GANs), a multi-task co-training strategy for the generator, and a locally initialized self-attention that emphasizes the neighborhood smoothness of natural images. Equipped with those findings, TransGAN can effectively scale up with bigger models and high-resolution image datasets. Specifically, our best architecture achieves highly competitive performance compared to current state-of-the-art GANs based on convolutional backbones. Specifically, TransGAN sets \textbf{new state-of-the-art} IS score of 10.10 and FID score of 25.32 on STL-10. It also reaches competitive 8.64 IS score and 11.89 FID score on Cifar-10, and 12.23 FID score on CelebA $64\times64$, respectively. We also conclude with a discussion of the current limitations and future potential of TransGAN. The code is available at \url{https://github.com/VITA-Group/TransGAN}.

翻译：最近对变压器的爆炸性兴趣表明,它们有可能成为强大的“通用”计算机视觉任务模型,如分类、检测和分解。然而,变压器又能如何去 — — 它们准备接受一些更臭名昭著的难懂的视觉任务,例如基因对抗网络(GANs )? 受这种好奇的驱使,我们进行了第一次试点研究,以建立GAN\ textbf{完全没有变压的架构。我们的香草GAN结构,称为Textbf{TransGAN},由一个基于记忆的变压器的发电机组成,在减少嵌入尺寸的同时,逐渐增加特性分辨率,以及一个也以变压器为基础的补级分析器级分析器。我们随后展示了TranGAN(TERGAN),一个针对发电机的多功能联合培训战略,一个强调自然图像的光度的地方化自我保存系统。与这些发现的结果相比, TransGAN( TransangGAN) 也能够有效地扩大以更大的模型和高额数据级A(I) 和高额的直径A) 。