Diverse image completion, a problem of generating various ways of filling incomplete regions (i.e. holes) of an image, has made remarkable success. However, managing input images with large holes is still a challenging problem due to the corruption of semantically important structures. In this paper, we tackle this problem by incorporating explicit structural guidance. We propose a structure-guided diffusion model (SGDM) for the large-hole diverse completion problem. Our proposed SGDM consists of a structure generator and a texture generator, which are both diffusion probabilistic models (DMs). The structure generator generates an edge image representing a plausible structure within the holes, which is later used to guide the texture generation process. To jointly train these two generators, we design a strategy that combines optimal Bayesian denoising and a momentum framework. In addition to the quality improvement, auxiliary edge images generated by the structure generator can be manually edited to allow user-guided image editing. Our experiments using datasets of faces (CelebA-HQ) and natural scenes (Places) show that our method achieves a comparable or superior trade-off between visual quality and diversity compared to other state-of-the-art methods.
翻译:不同图像完成, 是一个以各种方式填充图像不完整区域( 即孔) 的问题, 取得了显著的成功。 然而, 以大孔管理输入图像仍是一个棘手的问题, 原因是语义上重要结构的腐败。 在本文件中, 我们通过引入明确的结构指导来解决这个问题。 我们为大孔不同完成问题提出了一个结构引导扩散模型( SGDM ) 。 我们提议的SGDM 由结构生成器和质源生成器组成, 两者都是扩散概率模型( DMs ) 。 结构生成器生成了一个边缘图像, 代表着孔内一个貌似合理的结构, 后来用于指导质谱生成过程。 为了联合培训这两个生成器, 我们设计了一个战略, 将最佳巴耶斯消化和动力框架结合起来。 除了质量改进外, 结构生成器生成的辅助边缘图像可以手工编辑, 以便用户引导图像编辑。 我们使用脸部( CelebA-HQ) 和自然场景( Plaps) 的数据集进行实验, 显示我们的方法在视觉质量和多样性之间实现了可比较或更高级的贸易。 。