Score-based diffusion models synthesize samples by reversing a stochastic process that diffuses data to noise, and are trained by minimizing a weighted combination of score matching losses. The log-likelihood of score-based models can be tractably computed through a connection to continuous normalizing flows, but log-likelihood is not directly optimized by the weighted combination of score matching losses. We show that for a specific weighting scheme, the objective upper bounds the negative log-likelihood, thus enabling approximate maximum likelihood training of score-based models. We empirically observe that maximum likelihood training consistently improves the likelihood of score-based models across multiple datasets, stochastic processes, and model architectures. Our best models achieve negative log-likelihoods of 2.74 and 3.76 bits/dim on CIFAR-10 and down-sampled ImageNet, outperforming all existing likelihood-based models.
翻译:基于分数的分布模型通过扭转将数据分散到噪音的随机过程来合成样本,并通过尽量减少分数匹配损失的加权组合进行培训。基于分数的模型的日志相似性可以通过与连续的正常流连接来容易地计算。但是,分数匹配损失的加权组合并没有直接优化日志相似性。我们表明,对于具体的加权计划,目标的上限是负日志相似性,从而使得对基于分数的模型进行尽可能大的可能性培训成为可能。我们从经验上看到,最有可能的培训不断提高以分数为基础的模型在多个数据集、随机过程和模型结构中的可能性。我们的最佳模型在CIFAR-10和下标图像网上实现了2.74和3.76位/位/位的负日志,其效果超过了所有现有的基于概率的模型。我们的最佳模型在CIFAR-10和下标的图像网上实现了2.74和3.76位/位/位的负日数。