Language guided image inpainting aims to fill in the defective regions of an image under the guidance of text while keeping non-defective regions unchanged. However, the encoding process of existing models suffers from either receptive spreading of defective regions or information loss of non-defective regions, giving rise to visually unappealing inpainting results. To address the above issues, this paper proposes N\"UWA-LIP by incorporating defect-free VQGAN (DF-VQGAN) with multi-perspective sequence to sequence (MP-S2S). In particular, DF-VQGAN introduces relative estimation to control receptive spreading and adopts symmetrical connections to protect information. MP-S2S further enhances visual information from complementary perspectives, including both low-level pixels and high-level tokens. Experiments show that DF-VQGAN performs more robustness than VQGAN. To evaluate the inpainting performance of our model, we built up 3 open-domain benchmarks, where N\"UWA-LIP is also superior to recent strong baselines.
翻译:语言指导图像绘制的目的是在文字指导下填充有缺陷的图像区域,同时保持不偏差区域不变;然而,现有模型的编码过程要么是接受有缺陷区域的传播,要么是不偏差区域的信息丢失,从而产生视觉上不吸引的油漆结果;为了解决上述问题,本文件提议N\“UWA-LIP”,将无缺陷VQGAN(DF-VQGAN)纳入多视序列(MP-S2S),从而将无缺陷VQGAN(DF-VQGAN)纳入多视谱序列(MP-S2S),从而让DF-VQGAN(DF-VQGAN)引入相对估计,以控制接受传播和采用对称连接来保护信息。MPS-2S进一步从互补的角度加强视觉信息,包括低级别的像素和高标记。实验表明,DF-VQGAN比VQAN(VQGAN)更可靠。为了评估我们模型的内窥度性性性业绩,我们建立了3个开放域基准,N\UWA-LIP(UIP)也高于最近的强基线。