We propose in this paper a new paradigm for facial video compression. We leverage the generative capacity of GANs such as StyleGAN to represent and compress a video, including intra and inter compression. Each frame is inverted in the latent space of StyleGAN, from which the optimal compression is learned. To do so, a diffeomorphic latent representation is learned using a normalizing flows model, where an entropy model can be optimized for image coding. In addition, we propose a new perceptual loss that is more efficient than other counterparts. Finally, an entropy model for video inter coding with residual is also learned in the previously constructed latent representation. Our method (SGANC) is simple, faster to train, and achieves better results for image and video coding compared to state-of-the-art codecs such as VTM, AV1, and recent deep learning techniques. In particular, it drastically minimizes perceptual distortion at low bit rates.
翻译:我们在本文中提出面部视频压缩的新范式。 我们利用StyleGAN等GAN的基因能力代表并压缩一个视频, 包括内部和内部压缩。 每个框架都在StyleGAN的潜伏空间中被反转, 从中可以学到最佳压缩。 为此, 使用正常化流模式学习了异己形态潜在代表, 从而可以优化成像编码模型。 此外, 我们提出一种新的概念损失, 比其他对应方更有效 。 最后, 在先前构建的隐性代表中, 也学习了视频与残余编码之间的加密模型 。 我们的方法( SGANC) 简单、 更快地培训, 并且比VTM、 AV1 和最近的深层学习技术等最新艺术编码法, 取得更好的图像和视频编码结果 。 特别是, 它以低位速快速将感知扭曲降到最低 。