Diffusion models achieve state-of-the-art performance in various generation tasks. However, their theoretical foundations fall far behind. This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace. Our result provides sample complexity bounds for distribution estimation using diffusion models. We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated. Furthermore, the generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution. The convergence rate depends on the subspace dimension, indicating that diffusion models can circumvent the curse of data ambient dimensionality.
翻译:传播模型在各种生成任务中达到最先进的性能。 但是,它们的理论基础远远落后于它们的理论基础。 本文研究对扩散模型的近似值、估计值和分布恢复值进行评分, 当数据在一个未知的低维线性子空间得到支持时。 我们的结果提供了使用扩散模型进行分布估计的样本复杂度。 我们显示,通过一个适当选择的神经网络结构,得分函数既可以准确估计,也可以有效估计。 此外,基于估计得分函数产生的分布值可以捕捉数据几何结构,并接近数据分布的近距离。 趋同率取决于子空间层面, 这表明扩散模型可以绕过数据环境维度的诅咒。