Recent advances in diffusion models bring state-of-the-art performance on image generation tasks. However, empirical results from previous research in diffusion models imply an inverse correlation between density estimation and sample generation performances. This paper investigates with sufficient empirical evidence that such inverse correlation happens because density estimation is significantly contributed by small diffusion time, whereas sample generation mainly depends on large diffusion time. However, training a score network well across the entire diffusion time is demanding because the loss scale is significantly imbalanced at each diffusion time. For successful training, therefore, we introduce Soft Truncation, a universally applicable training technique for diffusion models, that softens the fixed and static truncation hyperparameter into a random variable. In experiments, Soft Truncation achieves state-of-the-art performance on CIFAR-10, CelebA, CelebA-HQ 256x256, and STL-10 datasets.
翻译:传播模型的最近进步带来了图像生成任务的最新最新表现。然而,以往对传播模型的实验结果意味着密度估计与样本生成的绩效之间反比关系。本文用足够的经验证据调查了这种反比关系,因为密度估计在很大程度上是由小传播时间促成的,而样本生成主要取决于大传播时间。然而,在整个传播时间里培训一个得分网络十分困难,因为损失规模在每个传播时间都严重失衡。因此,为了成功培训,我们采用了软调速技术,即普遍适用的扩散模型培训技术,将固定和静态脱轨超分光仪软化成随机变量。在实验中,软排速技术在CIFAR-10、CelibA、CelebA-HQ 256和STL-10数据集中实现了最先进的表现。