Score-based generative models (SGMs) need to approximate the scores $\nabla \log p_t$ of the intermediate distributions as well as the final distribution $p_T$ of the forward process. The theoretical underpinnings of the effects of these approximations are still lacking. We find precise conditions under which SGMs are able to produce samples from an underlying (low-dimensional) data manifold $\mathcal{M}$. This assures us that SGMs are able to generate the "right kind of samples". For example, taking $\mathcal{M}$ to be the subset of images of faces, we find conditions under which the SGM robustly produces an image of a face, even though the relative frequencies of these images might not accurately represent the true data generating distribution. Moreover, this analysis is a first step towards understanding the generalization properties of SGMs: Taking $\mathcal{M}$ to be the set of all training samples, our results provide a precise description of when the SGM memorizes its training data.
翻译:基于分数的基因化模型(SGMs)需要接近中间分布的分数$nabla p_t$,中间分布的分数和前方过程的最后分布值$p_T$,这些近似效应的理论基础仍然缺乏。我们发现,在精确的条件下,SGMs能够从一个基础(低维)数据元中提取样本$\mathcal{M}$。这保证了我们SGMs能够生成“正确的样本”。例如,以$\mathcal{M}$作为面部图像的子集,我们发现SGM能够强有力地生成一张面部图像的条件,尽管这些图像的相对频率可能无法准确反映真正的数据分布。此外,这一分析是了解SGMs的一般特性的第一步:将$\mathcal{M}美元作为所有培训样本的集,我们的结果准确地描述了SGM的训练数据何时出现。