Score-based diffusion models synthesize samples by reversing a stochastic process that diffuses data to noise, and are trained by minimizing a weighted combination of score matching losses. The log-likelihood of score-based models can be tractably computed through a connection to continuous normalizing flows, but log-likelihood is not directly optimized by the weighted combination of score matching losses. We show that for a specific weighting scheme, the objective upper bounds the negative log-likelihood, thus enabling approximate maximum likelihood training of score-based models. We empirically observe that maximum likelihood training consistently improves the likelihood of score-based models across multiple datasets, stochastic processes, and model architectures. Our best models achieve negative log-likelihoods of 2.74 and 3.76 bits/dim on CIFAR-10 and ImageNet 32x32, outperforming autoregressive models on these tasks.
翻译:基于分数的分布模型通过扭转将数据分散到噪音的随机过程来合成样本,并通过尽量减少分数匹配损失的加权组合进行培训。基于分数的模型的日志相似性可以通过与连续的正常流连接来容易地计算。但以分数匹配损失的加权组合并没有直接优化日志相似性。我们显示,对于具体的加权计划,目标的上限为负日志相似性,从而能够对基于分数的模型进行尽可能大的可能性培训。我们从经验上看到,最有可能的培训不断提高以分数为基础的模型在多个数据集、随机过程和模型结构中的可能性。我们的最佳模型在CIFAR-10和图像网32x32上实现了2.74和3.76位/位/位的负日志,在这些任务上超过了自动反向模型。