Denoising diffusion models have recently marked a milestone in high-quality image generation. One may thus wonder if they are suitable for neural image compression. This paper outlines an end-to-end optimized image compression framework based on a conditional diffusion model, drawing on the transform-coding paradigm. Besides the latent variables inherent to the diffusion process, this paper introduces an additional discrete ``content'' latent variable to condition the denoising process. This variable is equipped with a hierarchical prior for entropy coding. The remaining ``texture'' latent variables characterizing the diffusion process are synthesized (either stochastically or deterministically) at decoding time. We furthermore show that the performance can be tuned toward perceptual metrics of interest. Our extensive experiments involving five datasets and sixteen image quality assessment metrics show that our approach not only compares favorably in rate-perceptual quality but also shows close distortion performance with state-of-the-art models.
翻译:低调扩散模型最近标志着高质量图像生成中的一个里程碑。 因此, 人们可能会怀疑它们是否适合神经图像压缩。 本文概述了基于有条件扩散模型的端到端优化图像压缩框架, 以变换编码模式为基础。 除了扩散过程固有的潜在变量外, 本文还引入了另外一种离散的“ content” 潜伏变量, 以决定脱色过程。 此变量在加密编码前先配有等级级。 剩下的“ texture” 潜在变量在扩散过程的特性在解码时被合成( 随机化的或决定性的) 。 我们进一步显示, 性能可以被调整为感知性的兴趣度指标 。 我们涉及五个数据集和十六个图像质量评估指标的广泛实验显示, 我们的方法不仅在率感知质量上比较优异, 而且还显示与最先进的模型的扭曲性能 。