Generative adversarial networks (GANs) have enabled photorealistic image synthesis and editing. However, due to the high computational cost of large-scale generators (e.g., StyleGAN2), it usually takes seconds to see the results of a single edit on edge devices, prohibiting interactive user experience. In this paper, we take inspirations from modern rendering software and propose Anycost GAN for interactive natural image editing. We train the Anycost GAN to support elastic resolutions and channels for faster image generation at versatile speeds. Running subsets of the full generator produce outputs that are perceptually similar to the full generator, making them a good proxy for preview. By using sampling-based multi-resolution training, adaptive-channel training, and a generator-conditioned discriminator, the anycost generator can be evaluated at various configurations while achieving better image quality compared to separately trained models. Furthermore, we develop new encoder training and latent code optimization techniques to encourage consistency between the different sub-generators during image projection. Anycost GAN can be executed at various cost budgets (up to 10x computation reduction) and adapt to a wide range of hardware and latency requirements. When deployed on desktop CPUs and edge devices, our model can provide perceptually similar previews at 6-12x speedup, enabling interactive image editing. The code and demo are publicly available: https://github.com/mit-han-lab/anycost-gan.
翻译:由于大型发电机(例如StyleGAN2)的计算成本高昂,通常需要几秒钟才能看到边缘设备上单一编辑的结果,禁止互动用户体验。在本文中,我们从现代制成软件中汲取灵感,并提议Anycost GAN用于互动自然图像编辑。我们培训Anycost GAN,以支持弹性分辨率和更快以多种速度生成图像的渠道。运行全发电机子集,生成的输出在概念上与全发电机相似,使它们成为良好的预览代理。通过使用基于抽样的多分辨率培训、适应性通道培训和一个有创用条件的制导师,任何成本生成器都可以在各种配置中加以评价,同时与经过单独培训的模型相比,达到更好的图像质量。此外,我们开发了新的编码培训和潜伏代码优化技术,以鼓励不同子发电机在图像投影期间的一致性。任何成本GAN都可在各种成本预算中执行(最多为10x计算减缩),并用于预览预览。 当我们部署的硬件/潜置式图像时,可以提供广度的硬件/潜置速度。