We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). Conditional GANs have enabled a variety of applications, but the results are often limited to low-resolution and still far from realistic. In this work, we generate 2048x1024 visually appealing results with a novel adversarial loss, as well as new multi-scale generator and discriminator architectures. Furthermore, we extend our framework to interactive visual manipulation with two additional features. First, we incorporate object instance segmentation information, which enables object manipulations such as removing/adding objects and changing the object category. Second, we propose a method to generate diverse results given the same input, allowing users to edit the object appearance interactively. Human opinion studies demonstrate that our method significantly outperforms existing methods, advancing both the quality and the resolution of deep image synthesis and editing.
翻译:我们提出了一个新方法,用有条件的基因对抗网络(有条件的GANs)从语义标签图中合成高分辨率光真图像。条件性GANs使各种应用得以实现,但结果往往仅限于低分辨率,仍然远远不现实。在这项工作中,我们产生了2048x1024的视觉吸引力结果,出现了新的对抗性损失,以及新的多尺度生成器和歧视器结构。此外,我们将我们的框架扩大到互动视觉操作,增加了两个功能。首先,我们纳入了对象实例分割信息,使得能够进行物体操纵,例如删除/添加对象和改变对象类别。第二,我们提出一种方法,根据同样的投入产生不同的结果,允许用户交互编辑对象外观。人类观点研究表明,我们的方法大大超越了现有方法,提高了深度图像合成和编辑的质量和分辨率。