Structure-guided image completion aims to inpaint a local region of an image according to an input guidance map from users. While such a task enables many practical applications for interactive editing, existing methods often struggle to hallucinate realistic object instances in complex natural scenes. Such a limitation is partially due to the lack of semantic-level constraints inside the hole region as well as the lack of a mechanism to enforce realistic object generation. In this work, we propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects. Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts. Moreover, the object-level discriminators take aligned instances as inputs to enforce the realism of individual objects. Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks, including segmentation-guided completion, edge-guided manipulation and panoptically-guided manipulation on Places2 datasets. Furthermore, our trained model is flexible and can support multiple editing use cases, such as object insertion, replacement, removal and standard inpainting. In particular, our trained model combined with a novel automatic image completion pipeline achieves state-of-the-art results on the standard inpainting task.
翻译:结构引导图像完成的目的是根据用户的输入指导地图绘制一个图像的本地区域。 虽然这种任务使得许多实际应用能够用于互动编辑,但现有方法往往在复杂的自然场景中竭力制造幻觉,在复杂的自然场景中制造现实天体事件。这种限制部分是由于洞区缺乏语义层面的限制,以及缺乏执行现实天体生成的机制。在这项工作中,我们提出了一个学习模式,由语义歧视者和目标级歧视者组成,以改善复杂的语义和对象的生成。具体来说,语义歧视者利用预先训练的视觉特征来改进生成的视觉概念的真实性。此外,目标级歧视者也利用一致的例子作为执行单个天体现实性的投入。我们提议的计划极大地改进了生成质量,并实现了各种任务方面的最先进的结果,包括分解制完成、边导操纵和2号站台数据集上的全景制操纵。此外,我们经过培训的模型具有灵活性,能够支持多种编辑模型的使用案例,例如对象插入、替换、删除和标准化完成任务中经过培训的立标定的完成结果。