VQ-VAE 生成以等级VQ-VAE制成图像油漆的多样化结构 (Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE)

Given an incomplete image without additional constraint, image inpainting natively allows for multiple solutions as long as they appear plausible. Recently, multiplesolution inpainting methods have been proposed and shown the potential of generating diverse results. However, these methods have difficulty in ensuring the quality of each solution, e.g. they produce distorted structure and/or blurry texture. We propose a two-stage model for diverse inpainting, where the first stage generates multiple coarse results each of which has a different structure, and the second stage refines each coarse result separately by augmenting texture. The proposed model is inspired by the hierarchical vector quantized variational auto-encoder (VQ-VAE), whose hierarchical architecture isentangles structural and textural information. In addition, the vector quantization in VQVAE enables autoregressive modeling of the discrete distribution over the structural information. Sampling from the distribution can easily generate diverse and high-quality structures, making up the first stage of our model. In the second stage, we propose a structural attention module inside the texture generation network, where the module utilizes the structural information to capture distant correlations. We further reuse the VQ-VAE to calculate two feature losses, which help improve structure coherence and texture realism, respectively. Experimental results on CelebA-HQ, Places2, and ImageNet datasets show that our method not only enhances the diversity of the inpainting solutions but also improves the visual quality of the generated multiple images. Code and models are available at: https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting.

翻译：由于图像不完整,没有额外的限制,图像涂色土本允许多种解决方案,只要看起来合理。最近,提出了多重解析图解方法,并展示了产生不同结果的可能性。然而,这些方法难以确保每个解决方案的质量,例如,它们产生扭曲的结构和/或模糊的纹理。我们提出一个不同油漆的两阶段模型,第一阶段产生多种粗质结果,其中每个阶段都有不同的结构,而第二阶段则通过增加纹理来分别改进每个粗质的模型。拟议模型的灵感来自等级矢量定量变异的自动读取器(VQ-VAE),其等级结构结构会缠绕结构和质质信息。此外,VQVAE的矢量夸度使结构的离散分布能够自动反向建模。从分布中采集的样本很容易产生多样性和高质量结构,从而提升我们的模型的第一阶段。在第二个阶段,我们提出一个结构性关注模块在文本生成的质变异性图像网络中,S-VQ-VA的等级结构结构也提高了结构结构的稳定性。我们用模型来改进了结构结构的清晰度。