Denoising diffusion models are a recent class of generative models exhibiting state-of-the-art performance in image and audio synthesis. Such models approximate the time-reversal of a forward noising process from a target distribution to a reference density, which is usually Gaussian. Despite their strong empirical results, the theoretical analysis of such models remains limited. In particular, all current approaches crucially assume that the target density admits a density w.r.t. the Lebesgue measure. This does not cover settings where the target distribution is supported on a lower-dimensional manifold or is given by some empirical distribution. In this paper, we bridge this gap by providing the first convergence results for diffusion models in this more general setting. In particular, we provide quantitative bounds on the Wasserstein distance of order one between the target data distribution and the generative distribution of the diffusion model.
翻译:消化扩散模型是最近一类在图像和音频合成方面表现最先进的基因模型,这些模型与从目标分布到参考密度(通常是高斯文)的前点点点点点进程的时间反射相近。尽管这些模型的理论分析有很强的经验结果,但这种模型的理论分析仍然有限。特别是,所有目前的方法都关键地假定目标密度接受一个密度(w.r.t. Lebesgue)测量。这不包括目标分布在较低维度的方块上得到支持或某些经验分布所提供的情况。在本文中,我们通过为这一更为笼统的环境下的传播模型提供第一个趋同结果来弥补这一差距。特别是,我们提供了目标数据分布与传播模型的基因分布之间瓦列斯特斯坦一号顺序距离的定量界限。