Diffusion-based generative models are extremely effective in generating high-quality images, with generated samples often surpassing the quality of those produced by other models under several metrics. One distinguishing feature of these models, however, is that they typically require long sampling chains to produce high-fidelity images. This presents a challenge not only from the lenses of sampling time, but also from the inherent difficulty in backpropagating through these chains in order to accomplish tasks such as model inversion, i.e. approximately finding latent states that generate known images. In this paper, we look at diffusion models through a different perspective, that of a (deep) equilibrium (DEQ) fixed point model. Specifically, we extend the recent denoising diffusion implicit model (DDIM; Song et al. 2020), and model the entire sampling chain as a joint, multivariate fixed point system. This setup provides an elegant unification of diffusion and equilibrium models, and shows benefits in 1) single image sampling, as it replaces the fully-serial typical sampling process with a parallel one; and 2) model inversion, where we can leverage fast gradients in the DEQ setting to much more quickly find the noise that generates a given image. The approach is also orthogonal and thus complementary to other methods used to reduce the sampling time, or improve model inversion. We demonstrate our method's strong performance across several datasets, including CIFAR10, CelebA, and LSUN Bedrooms and Churches.
翻译:在生成高质量图像方面,基于融合的基因模型极为有效,生成的样本往往超过其他模型在几个尺度下产生的样本的质量。然而,这些模型的一个显著特征是,这些模型通常需要较长的取样链才能产生高不贞的图像。这不仅从取样时间的透镜上提出了挑战,而且由于在通过这些链子进行反射以完成模型反向(即大约找到生成已知图像的潜伏状态)等任务方面固有的困难,产生了高质量的图像。在本文中,我们通过不同的角度,即(深度)平衡(DEQ)固定点模型,来查看扩散模型的质量。具体地说,我们扩大最近的消音传播隐含模型(DDIM;Song等人,2020年),并将整个取样链作为联合、多变固定点系统来模拟。这一设置为传播和平衡模型提供了优雅的统一,并展示了1个单一图像取样的好处,因为它用一个平行的样本取代了全空典型取样过程;2)反向模型,我们可以利用DEQ(深度)平衡(DEQ)固定点模型(DEQ)固定点模型中的快速梯度梯度模型(DIMA;S),从而将我们使用的快速地展示了另一个或更精确地展示了我们所用的数据方法,从而也提高了了我们采用的频率,从而改进了另一个图像。