Discrete-time diffusion-based generative models and score matching methods have shown promising results in modeling high-dimensional image data. Recently, Song et al. (2021) show that diffusion processes that transform data into noise can be reversed via learning the score function, i.e. the gradient of the log-density of the perturbed data. They propose to plug the learned score function into an inverse formula to define a generative diffusion process. Despite the empirical success, a theoretical underpinning of this procedure is still lacking. In this work, we approach the (continuous-time) generative diffusion directly and derive a variational framework for likelihood estimation, which includes continuous-time normalizing flows as a special case, and can be seen as an infinitely deep variational autoencoder. Under this framework, we show that minimizing the score-matching loss is equivalent to maximizing a lower bound of the likelihood of the plug-in reverse SDE proposed by Song et al. (2021), bridging the theoretical gap.
翻译:最近(2021年)Song等人(2021年)指出,将数据转化为噪音的传播过程可以通过学习分数函数(即受扰动数据的日志密度梯度)而逆转。他们提议将学过分函数插入反向公式以定义基因化传播过程。尽管取得了成功经验,但这一程序的理论基础仍然缺乏。在这项工作中,我们直接接触(持续时间)基因化传播,并得出一个可能性估算的变异框架,其中包括作为特例的连续时间正常流,并可以被视为无限深度的变异自动编码。在这个框架内,我们表明,尽量减少计分损失等于最大限度地缩小Song等人(2021年)提议的插插反SDE的可能性。