CoD：面向图像压缩的扩散基础模型 (CoD: A Diffusion Foundation Model for Image Compression)

Existing diffusion codecs typically build on text-to-image diffusion foundation models like Stable Diffusion. However, text conditioning is suboptimal from a compression perspective, hindering the potential of downstream diffusion codecs, particularly at ultra-low bitrates. To address it, we introduce \textbf{CoD}, the first \textbf{Co}mpression-oriented \textbf{D}iffusion foundation model, trained from scratch to enable end-to-end optimization of both compression and generation. CoD is not a fixed codec but a general foundation model designed for various diffusion-based codecs. It offers several advantages: \textbf{High compression efficiency}, replacing Stable Diffusion with CoD in downstream codecs like DiffC achieves SOTA results, especially at ultra-low bitrates (e.g., 0.0039 bpp); \textbf{Low-cost and reproducible training}, 300$\times$ faster training than Stable Diffusion ($\sim$ 20 vs. $\sim$ 6,250 A100 GPU days) on entirely open image-only datasets; \textbf{Providing new insights}, e.g., We find pixel-space diffusion can achieve VTM-level PSNR with high perceptual quality and can outperform GAN-based codecs using fewer parameters. We hope CoD lays the foundation for future diffusion codec research. Codes will be released.

翻译：现有的扩散编解码器通常基于文本到图像的扩散基础模型（如Stable Diffusion）构建。然而，从压缩的角度来看，文本条件化并非最优选择，这限制了下游扩散编解码器的潜力，尤其是在超低码率场景下。为解决此问题，我们提出了\\textbf{CoD}，这是首个面向\\textbf{Co}mpression（压缩）的\\textbf{D}iffusion（扩散）基础模型，通过从零开始训练，实现了压缩与生成能力的端到端优化。CoD并非一个固定的编解码器，而是一个为各类基于扩散的编解码器设计的通用基础模型。它具有以下优势：\\textbf{高压缩效率}——在下游编解码器（如DiffC）中用CoD替换Stable Diffusion可获得当前最优（SOTA）结果，尤其在超低码率下（例如0.0039 bpp）；\\textbf{低成本且可复现的训练}——在完全开源的纯图像数据集上，其训练速度比Stable Diffusion快300倍（约20 vs. 约6,250个A100 GPU天）；\\textbf{提供新的研究视角}——例如，我们发现像素空间扩散模型能够以高感知质量达到VTM级别的PSNR，并且能以更少的参数量超越基于GAN的编解码器。我们希望CoD能为未来扩散编解码器的研究奠定基础。相关代码将开源发布。