Image completion is a task that aims to fill in the missing region of a masked image with plausible contents. However, existing image completion methods tend to fill in the missing region with the surrounding texture instead of hallucinating a visual instance that is suitable in accordance with the context of the scene. In this work, we propose a novel image completion model, dubbed ImComplete, that hallucinates the missing instance that harmonizes well with - and thus preserves - the original context. ImComplete first adopts a transformer architecture that considers the visible instances and the location of the missing region. Then, ImComplete completes the semantic segmentation masks within the missing region, providing pixel-level semantic and structural guidance. Finally, the image synthesis blocks generate photo-realistic content. We perform a comprehensive evaluation of the results in terms of visual quality (LPIPS and FID) and contextual preservation scores (CLIPscore and object detection accuracy) with COCO-panoptic and Visual Genome datasets. Experimental results show the superiority of ImComplete on various natural images.
翻译:图像补全是一项旨在利用合理的内容填补遮挡图像的缺失区域的任务。然而,现有的图像补全方法往往会使用周围的纹理填充缺失区域,而非幻象与场景背景相符的视觉实例。在这项工作中,我们提出了一种新颖的图像补全模型ImComplete,它可以幻象填充缺失实例,并且能与原始场景背景协调得很好,并以此为基础进行图像完成。ImComplete首先采用变压器架构,考虑可见实例和缺失区域的位置,然后在缺失区域内完成语义分割掩模,提供像素级别的语义和结构指导,最后,图像合成块生成逼真的内容。我们使用COCO-panoptic和Visual Genome数据集对结果进行全面评估,评估结果以视觉质量(LPIPS和FID)和保留上下文得分(CLIPscore和物体检测准确度)为基础。实验结果表明,ImComplete在各种自然图像上具有优越性。