Lossy image compression aims to represent images in as few bits as possible while maintaining fidelity to the original. Theoretical results indicate that optimizing distortion metrics such as PSNR or MS-SSIM necessarily leads to a discrepancy in the statistics of original images from those of reconstructions, in particular at low bitrates, often manifested by the blurring of the compressed images. Previous work has leveraged adversarial discriminators to improve statistical fidelity. Yet these binary discriminators adopted from generative modeling tasks may not be ideal for image compression. In this paper, we introduce a non-binary discriminator that is conditioned on quantized local image representations obtained via VQ-VAE autoencoders. Our evaluations on the CLIC2020, DIV2K and Kodak datasets show that our discriminator is more effective for jointly optimizing distortion (e.g., PSNR) and statistical fidelity (e.g., FID) than the state-of-the-art HiFiC model. On the CLIC2020 test set, we obtain the same FID as HiFiC with 30-40% fewer bits.
翻译:理论结果显示,优化 PSNR 或 MS- SSIM 等扭曲度量仪的优化必然导致重建原始图像的统计差异,特别是低位速率,这往往表现在压缩图像的模糊性上。 先前的工作利用了对抗性歧视者来提高统计忠诚性。 然而,从变异模型任务中采纳的这些二进制歧视者可能不适合图像压缩。 在本文中,我们引入了一个非二进制歧视者,该歧视者以通过 VQ-VAE 自动编码器获得的量化本地图像显示为条件。 我们对CLIC2020、DIV2K 和 Kodak 数据集的评估表明,我们的区分者比HIFIC2020 测试集获得与HIFIC相同的FID, 少30-40%。