Score-based model research in the last few years has produced state of the art generative models by employing Gaussian denoising score-matching (DSM). However, the Gaussian noise assumption has several high-dimensional limitations, motivating a more concrete route toward even higher dimension PDF estimation in future. We outline this limitation, before extending the theory to a broader family of noising distributions -- namely, the generalised normal distribution. To theoretically ground this, we relax a key assumption in (denoising) score matching theory, demonstrating that distributions which are differentiable \textit{almost everywhere} permit the same objective simplification as Gaussians. For noise vector length distributions, we demonstrate favourable concentration of measure in the high-dimensional spaces prevalent in deep learning. In the process, we uncover a skewed noise vector length distribution and develop an iterative noise scaling algorithm to consistently initialise the multiple levels of noise in annealed Langevin dynamics. On the practical side, our use of heavy-tailed DSM leads to improved score estimation, controllable sampling convergence, and more balanced unconditional generative performance for imbalanced datasets.
翻译:在过去几年里,基于分数的模型研究通过使用高山分分比法(DSM),产生了最先进的基因模型。然而,高山噪音假设具有若干高维的局限性,这促使今后采用更具体的方法进行更高层次的PDF估计。我们概述了这一局限性,然后将理论推广到一个更广泛的无声分布大家庭 -- -- 即一般的正常分布。从理论上讲,我们放松了(否认)得分比对理论中的关键假设,表明分布与高山分差不一,允许与高山分差相同的客观简化。对于噪声矢量长度分布,我们展示了在深层学习中普遍存在的高度空间中测量的有利集中度。在此过程中,我们发现了一个斜度的噪声矢量分布,并开发了一种迭代噪缩算法,以持续在无线朗埃文动态中开启多种程度的噪音。在实际方面,我们使用重成型的DSMM导致改进得分估计、可控制的抽样融合以及更均衡的无条件的基因表现。