Denoising diffusion models have been a mainstream approach for image generation, however, training these models often suffers from slow convergence. In this paper, we discovered that the slow convergence is partly due to conflicting optimization directions between timesteps. To address this issue, we treat the diffusion training as a multi-task learning problem, and introduce a simple yet effective approach referred to as Min-SNR-$\gamma$. This method adapts loss weights of timesteps based on clamped signal-to-noise ratios, which effectively balances the conflicts among timesteps. Our results demonstrate a significant improvement in converging speed, 3.4$\times$ faster than previous weighting strategies. It is also more effective, achieving a new record FID score of 2.06 on the ImageNet $256\times256$ benchmark using smaller architectures than that employed in previous state-of-the-art.
翻译:然而,在本文中,我们发现,缓慢的趋同部分是由于时间步骤之间的优化方向相互冲突。为了解决这一问题,我们把传播培训视为一个多任务学习问题,并引入一个简单而有效的方法,称为Min-SNR-$\gamma$。这种方法根据紧凑的信号到噪音比率调整了时间步骤损失的权重,从而有效地平衡了时间步骤之间的冲突。我们的结果显示,速度的趋同速度有了显著的提高,比以前的加权战略快了3.4美元。还比较有效,利用比以前最先进的结构,在图像网上实现了2.06的创纪录的FID分,256美元乘以256美元的基准。</s>