Restoring reasonable and realistic content for arbitrary missing regions in images is an important yet challenging task. Although recent image inpainting models have made significant progress in generating vivid visual details, they can still lead to texture blurring or structural distortions due to contextual ambiguity when dealing with more complex scenes. To address this issue, we propose the Semantic Pyramid Network (SPN) motivated by the idea that learning multi-scale semantic priors from specific pretext tasks can greatly benefit the recovery of locally missing content in images. SPN consists of two components. First, it distills semantic priors from a pretext model into a multi-scale feature pyramid, achieving a consistent understanding of the global context and local structures. Within the prior learner, we present an optional module for variational inference to realize probabilistic image inpainting driven by various learned priors. The second component of SPN is a fully context-aware image generator, which adaptively and progressively refines low-level visual representations at multiple scales with the (stochastic) prior pyramid. We train the prior learner and the image generator as a unified model without any post-processing. Our approach achieves the state of the art on multiple datasets, including Places2, Paris StreetView, CelebA, and CelebA-HQ, under both deterministic and probabilistic inpainting setups.
翻译:重新恢复图像中任意失踪区域的合理和现实内容是一项重要但具有挑战性的任务。虽然最近图像涂漆模型在生成生动的视觉细节方面取得了显著进展,但由于背景模糊,仍然可能导致纹理模糊或结构扭曲。为了解决这一问题,我们提议使用语义型金字塔网络(SPN),其动机是学习多尺度的语义前缀,从具体的托辞任务中学习多尺度的语义前缀,可以极大地帮助恢复图像中本地缺失的内容。SPN由两个部分组成。首先,它从一个借口模型中提取出语义前缀,形成一个多尺度的地貌金字型,实现对全球背景和地方结构的一致理解。在先前的学习者中,我们提出了一个可选用的变式推断模块,以便实现由各种已学前科驱动的概率性图像。 SPNPN的第二个组成部分是完全符合环境特征的图像生成器,可以适应和逐步改进多个尺度的低级图像表达方式,与以前的(调查型)金字塔。我们训练了先前的学习者和图像生成者,在多种艺术模型中,包括Cel-A-Shebrebrebal、没有任何州级的版本的版本。