Score-based generative models (SGMs) have demonstrated remarkable synthesis quality. SGMs rely on a diffusion process that gradually perturbs the data towards a tractable distribution, while the generative model learns to denoise. The complexity of this denoising task is, apart from the data distribution itself, uniquely determined by the diffusion process. We argue that current SGMs employ overly simplistic diffusions, leading to unnecessarily complex denoising processes, which limit generative modeling performance. Based on connections to statistical mechanics, we propose a novel critically-damped Langevin diffusion (CLD) and show that CLD-based SGMs achieve superior performance. CLD can be interpreted as running a joint diffusion in an extended space, where the auxiliary variables can be considered "velocities" that are coupled to the data variables as in Hamiltonian dynamics. We derive a novel score matching objective for CLD and show that the model only needs to learn the score function of the conditional distribution of the velocity given data, an easier task than learning scores of the data directly. We also derive a new sampling scheme for efficient synthesis from CLD-based diffusion models. We find that CLD outperforms previous SGMs in synthesis quality for similar network architectures and sampling compute budgets. We show that our novel sampler for CLD significantly outperforms solvers such as Euler--Maruyama. Our framework provides new insights into score-based denoising diffusion models and can be readily used for high-resolution image synthesis. Project page and code: https://nv-tlabs.github.io/CLD-SGM.
翻译:基于分数的基因变异模型(SGMs)展示了惊人的合成质量。 SGMs依靠一个逐渐干扰数据向可移植分布的传播过程,而基因变异模型则学习隐蔽。除数据分布本身之外,这一分解任务的复杂性由扩散过程决定。我们争辩说,目前的SGMs采用过于简单化的传播,导致不必要复杂的分解过程,从而限制基因变异模型的性能。根据与统计机械的连接,我们建议采用一个新颖的、严谨的Langevin传播(CLD),并显示基于本地的SGMs能够取得优异的性能。可以被解释为在一个扩展的空间中运行一个联合传播,在这个空间里,辅助变量可以被视为“速度”与汉密尔顿动态中的数据变量相配套的。我们为SGMMS得出了一个新分匹配目标,显示该模型只需学习有条件的速度分配分数的分数,比直接了解数据化分数要容易。我们还从基于基于基础的流化图像变异的图像传播模型(C)的新的采样方案。 我们发现,Squal-rbrb-rmal- dex- dromab- crud 格式的模型可以大大显示我们之前的缩缩缩校略的模型的系统结构结构结构结构。