Denoising diffusion models have been a mainstream approach for image generation, however, training these models often suffers from slow convergence. In this paper, we discovered that the slow convergence is partly due to conflicting optimization directions between timesteps. To address this issue, we treat the diffusion training as a multi-task learning problem, and introduce a simple yet effective approach referred to as Min-SNR-$\gamma$. This method adapts loss weights of timesteps based on clamped signal-to-noise ratios, which effectively balances the conflicts among timesteps. Our results demonstrate a significant improvement in converging speed, 3.4$\times$ faster than previous weighting strategies. It is also more effective, achieving a new record FID score of 2.06 on the ImageNet $256\times256$ benchmark using smaller architectures than that employed in previous state-of-the-art. The code is available at https://github.com/TiankaiHang/Min-SNR-Diffusion-Training.
翻译:高效的扩散训练:基于最小信噪比加权策略
消除噪声的扩散模型已成为一种主流的图像生成方法,然而,训练这些模型时往往出现收敛缓慢的问题。本文发现,收敛缓慢的原因之一是因为各时间步之间存在优化方向上的冲突。为了解决这一问题,我们将扩散训练看作一种多任务学习问题,并引入了一种简单而有效的方法,称为Min-SNR-$\gamma$(最小信噪比-$\gamma$)。该方法根据固定的信噪比来调整各时间步的损失权重,有效地平衡了各时间步之间的冲突。实验结果表明,该方法的收敛速度显著提高,比先前的加权策略快 3.4 倍。同时,与之前最先进研究采用的比较大的架构相比,我们采用较小的架构获得了新的 ImageNet $256\times256$ 基准数据集的记录 FID 分数2.06。代码可在以下链接中获取:https://github.com/TiankaiHang/Min-SNR-Diffusion-Training。