Score-based generative modeling (SGM) is a highly successful approach for learning a probability distribution from data and generating further samples. We prove the first polynomial convergence guarantees for the core mechanic behind SGM: drawing samples from a probability density $p$ given a score estimate (an estimate of $\nabla \ln p$) that is accurate in $L^2(p)$. Compared to previous works, we do not incur error that grows exponentially in time or that suffers from a curse of dimensionality. Our guarantee works for any smooth distribution and depends polynomially on its log-Sobolev constant. Using our guarantee, we give a theoretical analysis of score-based generative modeling, which transforms white-noise input into samples from a learned data distribution given score estimates at different noise scales. Our analysis gives theoretical grounding to the observation that an annealed procedure is required in practice to generate good samples, as our proof depends essentially on using annealing to obtain a warm start at each step. Moreover, we show that a predictor-corrector algorithm gives better convergence than using either portion alone.
翻译:基于分数的基因模型(SGM)是一种非常成功的方法,用于从数据中学习概率分布,并生成更多的样本。我们证明,对于SGM背后的核心机械工来说,我们第一个多式融合保证:从概率密度中抽取样本(美元=纳布拉=美元=美元=美元=美元=美元=美元=美元),其分数估计准确(美元=美元=美元=美元=美元=美元=美元=美元=美元=美元=美元)。与以前的工程相比,我们并不产生在时间上成倍增长的错误,或受到维度诅咒的错误。我们保证任何顺利分布的工程,并依赖其日志-Sobolev常数的多元性。我们通过我们的保证,对基于分数的基因模型进行理论分析,根据不同噪音尺度的分数估计,将白音素输入的样本转化为样本。我们的分析从理论上认为,实际中需要一种无线程序才能产生好样品,因为我们的证据主要取决于使用nealing来获得温暖的开始。此外,我们显示,预测或校正算法比仅仅使用一个部分更接近。