Computer graphics has experienced a recent surge of data-centric approaches for photorealistic and controllable content creation. StyleGAN in particular sets new standards for generative modeling regarding image quality and controllability. However, StyleGAN's performance severely degrades on large unstructured datasets such as ImageNet. StyleGAN was designed for controllability; hence, prior works suspect its restrictive design to be unsuitable for diverse datasets. In contrast, we find the main limiting factor to be the current training strategy. Following the recently introduced Projected GAN paradigm, we leverage powerful neural network priors and a progressive growing strategy to successfully train the latest StyleGAN3 generator on ImageNet. Our final model, StyleGAN-XL, sets a new state-of-the-art on large-scale image synthesis and is the first to generate images at a resolution of $1024^2$ at such a dataset scale. We demonstrate that this model can invert and edit images beyond the narrow domain of portraits or specific object classes.
翻译:计算机图形最近经历了以数据为中心的光现实和可控内容创建方法的激增。 StyleGAN 特别为图像质量和可控性的基因模型设定了新的标准。 然而, StyleGAN 的性能在图像网络等大型非结构化数据集上严重退化。 StyleGAN 设计为可控性; 因此, 先前的工作怀疑其限制性设计不适合多种数据集。 相反, 我们发现主要的限制因素是当前的培训战略。 根据最近推出的预测GAN 模式, 我们利用强大的神经网络前端和渐进式增长战略来成功培训图像网络上最新的StyleGAN3 生成器。 我们的最后模型StyleGAN-XL 设置了大规模图像合成的新状态, 是第一个在这种数据集规模下以1024 $2美元的分辨率生成图像。 我们证明该模型可以倒置和编辑超出狭窄的肖像域或特定对象类别的图像。