Deep neural networks can learn powerful prior probability models for images, as evidenced by the high-quality generations obtained with recent score-based diffusion methods. But the means by which these networks capture complex global statistical structure, apparently without suffering from the curse of dimensionality, remain a mystery. To study this, we incorporate diffusion methods into a multi-scale decomposition, reducing dimensionality by assuming a stationary local Markov model for wavelet coefficients conditioned on coarser-scale coefficients. We instantiate this model using convolutional neural networks (CNNs) with local receptive fields, which enforce both the stationarity and Markov properties. Global structures are captured using a CNN with receptive fields covering the entire (but small) low-pass image. We test this model on a dataset of face images, which are highly non-stationary and contain large-scale geometric structures. Remarkably, denoising, super-resolution, and image synthesis results all demonstrate that these structures can be captured with significantly smaller conditioning neighborhoods than required by a Markov model implemented in the pixel domain. Our results show that score estimation for large complex images can be reduced to low-dimensional Markov conditional models across scales, alleviating the curse of dimensionality.
翻译:深心神经网络可以学习以最近基于分数的传播方法获得的高品质世代获得的图像的强大先前概率模型。但是,这些网络捕捉复杂的全球统计结构的手段,显然没有受到维度的诅咒,仍然是个谜。为了研究这一点,我们将扩散方法纳入一个多尺度分解,通过假设一个固定的本地马可夫模型来降低以粗略系数为条件的波子系数的维度。我们利用具有本地接收场的转动神经网络(CNNs)来回馈这一模型,这些网络既能执行定点性功能,又能执行Markov特性。全球结构是使用CNN的接收场捕捉到的,覆盖整个(但很小的)低视距图像。我们用一个超小型的图像数据集测试这一模型的传播模式,这些模型包含大规模不静止的几何结构。有明显地、分解性、超分辨率和图像合成结果都表明,这些结构可以用远小于在等量域内实施的Markov模型所需的调节环境来捕捉到这些结构。我们的结果显示,大型复杂度图像的临界度的评分数尺度可以降低。</s>