This monograph presents the core principles that have guided the development of diffusion models, tracing their origins and showing how diverse formulations arise from shared mathematical ideas. Diffusion modeling starts by defining a forward process that gradually corrupts data into noise, linking the data distribution to a simple prior through a continuum of intermediate distributions. The goal is to learn a reverse process that transforms noise back into data while recovering the same intermediates. We describe three complementary views. The variational view, inspired by variational autoencoders, sees diffusion as learning to remove noise step by step. The score-based view, rooted in energy-based modeling, learns the gradient of the evolving data distribution, indicating how to nudge samples toward more likely regions. The flow-based view, related to normalizing flows, treats generation as following a smooth path that moves samples from noise to data under a learned velocity field. These perspectives share a common backbone: a time-dependent velocity field whose flow transports a simple prior to the data. Sampling then amounts to solving a differential equation that evolves noise into data along a continuous trajectory. On this foundation, the monograph discusses guidance for controllable generation, efficient numerical solvers, and diffusion-motivated flow-map models that learn direct mappings between arbitrary times. It provides a conceptual and mathematically grounded understanding of diffusion models for readers with basic deep-learning knowledge.
翻译:本专著阐述了指导扩散模型发展的核心原理,追溯其起源,并展示不同表述如何从共同的数学思想中产生。扩散建模首先定义一个前向过程,将数据逐步破坏为噪声,通过一系列连续中间分布将数据分布与简单先验分布相连接。其目标是学习一个反向过程,将噪声转化回数据,同时恢复相同的中间分布。我们描述了三种互补的视角。受变分自编码器启发的变分视角将扩散视为逐步学习去噪的过程;基于能量模型思想的得分匹配视角学习演化数据分布的梯度,指示如何将样本推向更高概率区域;与标准化流相关的流匹配视角则将生成过程视为在习得速度场作用下,沿光滑路径将样本从噪声迁移至数据。这些视角共享一个共同框架:一个时间依赖的速度场,其流将简单先验分布传输至数据分布。采样过程则转化为求解将噪声沿连续轨迹演化为数据的微分方程。在此基础上,本专著探讨了可控生成的引导技术、高效数值求解器,以及受扩散启发的流映射模型——该模型可学习任意时间点间的直接映射关系。本专著为具备基础深度学习知识的读者提供了扩散模型的概念性与数学基础理解。