Denoising diffusion probabilistic models have recently received much research attention since they outperform alternative approaches, such as GANs, and currently provide state-of-the-art generative performance. The superior performance of diffusion models has made them an appealing tool in several applications, including inpainting, super-resolution, and semantic editing. In this paper, we demonstrate that diffusion models can also serve as an instrument for semantic segmentation, especially in the setup when labeled data is scarce. In particular, for several pretrained diffusion models, we investigate the intermediate activations from the networks that perform the Markov step of the reverse diffusion process. We show that these activations effectively capture the semantic information from an input image and appear to be excellent pixel-level representations for the segmentation problem. Based on these observations, we describe a simple segmentation method, which can work even if only a few training images are provided. Our approach significantly outperforms the existing alternatives on several datasets for the same amount of human supervision.
翻译:消化扩散概率模型最近受到许多研究关注,因为这些模型优于其他方法,如GANs, 并且目前提供最先进的基因性能。 扩散模型的优异性能使这些模型在包括油漆、超分辨率和语义编辑在内的若干应用中成为了吸引的工具。 在本文中,我们证明扩散模型也可以作为语义分解的工具,特别是在标签数据稀缺的情况下,在设置中。 特别是对于一些事先经过训练的传播模型,我们调查了执行反向扩散进程Markov步骤的网络的中间激活。 我们显示,这些激活有效地从输入图像中捕捉了语义信息,并看起来是分解问题的极好的像素级表达方式。 根据这些观察,我们描述一个简单的分解方法,即使只提供少量培训图像,也能发挥作用。 我们的方法大大超越了用于相同数量人类监督的若干数据集的现有替代方法。