In recent years, generative models have undergone significant advancement due to the success of diffusion models. The success of these models is often attributed to their use of guidance techniques, such as classifier and classifier-free methods, which provides effective mechanisms to trade-off between fidelity and diversity. However, these methods are not capable of guiding a generated image to be aware of its geometric configuration, e.g., depth, which hinders the application of diffusion models to areas that require a certain level of depth awareness. To address this limitation, we propose a novel guidance approach for diffusion models that uses estimated depth information derived from the rich intermediate representations of diffusion models. To do this, we first present a label-efficient depth estimation framework using the internal representations of diffusion models. At the sampling phase, we utilize two guidance techniques to self-condition the generated image using the estimated depth map, the first of which uses pseudo-labeling, and the subsequent one uses a depth-domain diffusion prior. Experiments and extensive ablation studies demonstrate the effectiveness of our method in guiding the diffusion models toward geometrically plausible image generation. Project page is available at https://ku-cvlab.github.io/DAG/.
翻译:近年来,由于推广模型的成功,基因模型取得了显著进步,这些模型的成功往往归功于它们使用指导技术,例如分类和分类法等指导技术,这些技术为在忠诚和多样性之间取舍提供了有效的机制,但是,这些方法无法指导生成的图像了解其几何结构,例如深度,这妨碍了将传播模型应用于需要某种深度认识的领域。为了应对这一局限性,我们提议了一种新的传播模型指导方法,该方法使用从传播模型丰富的中间表现中得出的估计深度信息。为了做到这一点,我们首先利用传播模型的内部表现,提出一个具有标签效率的深度估计框架。在取样阶段,我们使用两种指导技术,利用估计深度地图对生成的图像进行自我调节,前者使用假标签,而后一种则使用深度分布法。为了应对这一局限性,我们提出了一种新的指导方法,用以指导传播模型向地理学上可信的图像生成工作。项目页面可在 https://ku-cvlab.ab.imob/AG.imob.