Generative models have recently undergone significant advancement due to the diffusion models. The success of these models can be often attributed to their use of guidance techniques, such as classifier or classifier-free guidance, which provide effective mechanisms to trade-off between fidelity and diversity. However, these methods are not capable of guiding a generated image to be aware of its geometric configuration, e.g., depth, which hinders their application to areas that require a certain level of depth awareness. To address this limitation, we propose a novel guidance method for diffusion models that uses estimated depth information derived from the rich intermediate representations of diffusion models. We first present label-efficient depth estimation framework using internal representations of diffusion models. Subsequently, we propose the incorporation of two guidance techniques based on pseudo-labeling and depth-domain diffusion prior during the sampling phase to self-condition the generated image using the estimated depth map. Experiments and comprehensive ablation studies demonstrate the effectiveness of our method in guiding the diffusion models towards the generation of geometrically plausible images.
翻译:最近,由于推广模型的推广模式,生成模型最近取得了显著进步,这些模型的成功往往可归因于它们使用指导技术,如分类师或免分类师指导,这些指导技术提供了在忠诚和多样性之间取舍的有效机制,然而,这些方法无法指导生成图像了解其几何配置,例如深度,这妨碍了将其应用于需要某种深度认识的领域。为了解决这一局限性,我们提议了一种新的传播模型指导方法,该方法使用从传播模型丰富的中间表示中得出的估计深度信息。我们首先利用内部的传播模型的表述提出贴标签效率的深度估计框架。随后,我们提议在取样阶段之前采用两种基于伪标签和深度分布的指导技术,以便利用估计深度图对生成图像进行自我调节。实验和全面化研究表明我们指导传播模型用于生成几何合理图像的方法的有效性。