Score-based diffusion models are a class of generative models whose dynamics is described by stochastic differential equations that map noise into data. While recent works have started to lay down a theoretical foundation for these models, an analytical understanding of the role of the diffusion time T is still lacking. Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution; however, a smaller value of T should be preferred for a better approximation of the score-matching objective and higher computational efficiency. Starting from a variational interpretation of diffusion models, in this work we quantify this trade-off, and suggest a new method to improve quality and efficiency of both training and sampling, by adopting smaller diffusion times. Indeed, we show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process. Empirical results support our analysis; for image data, our method is competitive w.r.t. the state-of-the-art, according to standard sample quality metrics and log-likelihood.
翻译:以分数为基础的扩散模型是一种基因化模型,其动态由将噪音映射成数据的随机差异方程式来描述。虽然最近的工作已经开始为这些模型奠定理论基础,但仍然缺乏对传播时间T作用的分析理解。目前,一个大T的最佳做法主张确保远方动态使扩散与已知的简单噪音分布足够接近;然而,为了更好地接近得分比对齐目标和更高的计算效率,应该偏爱T的较小价值。从对传播模式的变异解释开始,在这项工作中,我们量化了这一取舍,并提出了一种新方法,通过采用较小的传播时间来提高培训和取样的质量和效率。事实上,我们展示了如何使用辅助模型来缩小理想与模拟前方动态之间的差距,随后是标准的反向传播过程。实证结果支持了我们的分析;对于图像数据,我们的方法是竞争性的w.r.t.,根据标准的抽样质量指标和日志相似性。