Generative adversarial networks (GANs) have an enormous potential impact on digital content creation, e.g., photo-realistic digital avatars, semantic content editing, and quality enhancement of speech and images. However, the performance of modern GANs comes together with massive amounts of computations performed during the inference and high energy consumption. That complicates, or even makes impossible, their deployment on edge devices. The problem can be reduced with quantization -- a neural network compression technique that facilitates hardware-friendly inference by replacing floating-point computations with low-bit integer ones. While quantization is well established for discriminative models, the performance of modern quantization techniques in application to GANs remains unclear. GANs generate content of a more complex structure than discriminative models, and thus quantization of GANs is significantly more challenging. To tackle this problem, we perform an extensive experimental study of state-of-art quantization techniques on three diverse GAN architectures, namely StyleGAN, Self-Attention GAN, and CycleGAN. As a result, we discovered practical recipes that allowed us to successfully quantize these models for inference with 4/8-bit weights and 8-bit activations while preserving the quality of the original full-precision models.
翻译:创世对抗网络(GANs)对数字内容的创造具有巨大的潜在影响,例如,光现实数字数字动画、语义内容编辑以及语音和图像质量的提高等,对数字内容的创造具有巨大的潜在影响;然而,现代GANs的性能与在推论和高能源消耗期间进行的大量计算结合在一起,使得其在边缘装置上的部署变得复杂,甚至无法进行。问题可以通过四分化来减少 -- -- 神经网络压缩技术,通过以低位整数取代浮动点计算,便利硬件友好的推论。虽然对歧视模式的量化已经很确定,但应用到GANs的现代量化技术的性能仍然不明确。GANs产生比歧视模型更复杂的结构内容,因此,GANs的四分化则更具有相当大的挑战性。为了解决这个问题,我们对三种不同的GAN结构,即SstelGAN、自控GAN和CycellGAN等的状态四分化技术进行了广泛的实验研究。结果,我们发现,在保留质量模型的原质和原制模型的同时,我们又发现这些原型八分级模型的原型,使我们得以成功推进。