We propose a novel model named Multi-Channel Attention Selection Generative Adversarial Network (SelectionGAN) for guided image-to-image translation, where we translate an input image into another while respecting an external semantic guidance. The proposed SelectionGAN explicitly utilizes the semantic guidance information and consists of two stages. In the first stage, the input image and the conditional semantic guidance are fed into a cycled semantic-guided generation network to produce initial coarse results. In the second stage, we refine the initial results by using the proposed multi-scale spatial pooling & channel selection module and the multi-channel attention selection module. Moreover, uncertainty maps automatically learned from attention maps are used to guide the pixel loss for better network optimization. Exhaustive experiments on four challenging guided image-to-image translation tasks (face, hand, body, and street view) demonstrate that our SelectionGAN is able to generate significantly better results than the state-of-the-art methods. Meanwhile, the proposed framework and modules are unified solutions and can be applied to solve other generation tasks such as semantic image synthesis. The code is available at https://github.com/Ha0Tang/SelectionGAN.
翻译:我们提出一个名为多声道关注选择生成反对流网络(StectionGAN)的新模式,用于引导图像到图像翻译,我们将输入图像转化为另一个图像,同时尊重外部语义指导。拟议的选择GAN明确使用语义指导信息,由两个阶段组成。在第一阶段,输入图像和有条件语义指导将输入到一个循环的语义指导生成网络中,以产生初步粗糙的结果。在第二阶段,我们通过使用拟议的多尺度空间集合和频道选择模块以及多声道关注选择模块来改进初步结果。此外,从关注地图中自动学习的不确定性地图被用来指导像素丢失,以更好地优化网络优化。四种具有挑战性的图像到图像翻译任务(面部、手、体和街道视图)的外观实验表明,我们的选择GANAN能够产生比状态方法更好的显著效果。同时,拟议的框架和模块是统一的解决方案,可以用于解决诸如Semmangi/Timimages.