The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL-E 2, auto-regressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that na\"Ively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel pixels in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.
翻译:文本到图像合成最近的成功在暴风雨中吸引了世界,并捕捉了公众的想象力。 从技术的角度来看,它也标志着在设计基因化图像模型的有利结构中发生了巨大的变化。 GANs曾经是事实上的选择,使用StyleGAN等技术。 DALL-E 2, 自动递减和传播模型成为大规模基因化模型的新标准。 这一快速转变提出了一个根本性问题: 我们能否将GANs升级为像LAION这样的大型数据集? 我们发现,“Ively增加StyGAN结构的能力很快变得不稳定 ” 。 我们引入了GigaGAN, 这个新的GANs是一个远远超过这一极限的GANs, 展示了文本到图像合成的可行选项。 GigaGANAAN提供了三大优势。 首先, 在推论时间, 仅仅用0. 13 秒的时间来合成512px 图像, 它可以合成高分辨率图像, 例如, 16- megapixel pixel pix 结构迅速变得不稳定。 我们引入了GigAANs, 最后, 支持了各种变现式的图像, GIGAANsalalalationalalalal imal imalationaldationalationaldationaldaldaldaldation。</s>