Diffusion models have proven to be highly effective in generating high-quality images. However, adapting large pre-trained diffusion models to new domains remains an open challenge, which is critical for real-world applications. This paper proposes DiffFit, a parameter-efficient strategy to fine-tune large pre-trained diffusion models that enable fast adaptation to new domains. DiffFit is embarrassingly simple that only fine-tunes the bias term and newly-added scaling factors in specific layers, yet resulting in significant training speed-up and reduced model storage costs. Compared with full fine-tuning, DiffFit achieves 2$\times$ training speed-up and only needs to store approximately 0.12\% of the total model parameters. Intuitive theoretical analysis has been provided to justify the efficacy of scaling factors on fast adaptation. On 8 downstream datasets, DiffFit achieves superior or competitive performances compared to the full fine-tuning while being more efficient. Remarkably, we show that DiffFit can adapt a pre-trained low-resolution generative model to a high-resolution one by adding minimal cost. Among diffusion-based methods, DiffFit sets a new state-of-the-art FID of 3.02 on ImageNet 512$\times$512 benchmark by fine-tuning only 25 epochs from a public pre-trained ImageNet 256$\times$256 checkpoint while being 30$\times$ more training efficient than the closest competitor.
翻译:扩散模型已被证明能够生成高质量的图像。然而,将大型预训练的扩散模型适应到新领域仍然是个未解决的难题,这对于实际应用非常关键。本文提出DiffFit,一种参数高效的策略,可以微调大型预训练的扩散模型,从而实现快速适应新领域。DiffFit 的实现非常简单,只需微调特定层中的偏置项和新添加的缩放因子,却能大大提高训练速度和减少模型存储成本。与完整的微调相比,DiffFit 能够实现2倍的训练加速,并且只需要存储大约0.12%的总模型参数。我们提供了直观的理论分析,以证明缩放因子在快速适应中的有效性。在8个下游数据集上,DiffFit 在效率更高的情况下获得了优越或具有竞争力的性能。值得注意的是,我们展示了DiffFit可以通过最小成本的添加,将预训练的低分辨率生成模型调整为高分辨率模型。在基于扩散的方法中,DiffFit借助只在公共预训练的 ImageNet 256×256 检查点上进行25轮微调就能获得3.02 的 ImageNet 512×512 基准 FID 值,而且比最接近的竞争者高出30倍的训练效率。