语义图像合成的双向关注 GANs (Dual Attention GANs for Semantic Image Synthesis)

In this paper, we focus on the semantic image synthesis task that aims at transferring semantic label maps to photo-realistic images. Existing methods lack effective semantic constraints to preserve the semantic information and ignore the structural correlations in both spatial and channel dimensions, leading to unsatisfactory blurry and artifact-prone results. To address these limitations, we propose a novel Dual Attention GAN (DAGAN) to synthesize photo-realistic and semantically-consistent images with fine details from the input layouts without imposing extra training overhead or modifying the network architectures of existing methods. We also propose two novel modules, i.e., position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention Module (CAM), to capture semantic structure attention in spatial and channel dimensions, respectively. Specifically, SAM selectively correlates the pixels at each position by a spatial attention map, leading to pixels with the same semantic label being related to each other regardless of their spatial distances. Meanwhile, CAM selectively emphasizes the scale-wise features at each channel by a channel attention map, which integrates associated features among all channel maps regardless of their scales. We finally sum the outputs of SAM and CAM to further improve feature representation. Extensive experiments on four challenging datasets show that DAGAN achieves remarkably better results than state-of-the-art methods, while using fewer model parameters. The source code and trained models are available at https://github.com/Ha0Tang/DAGAN.

翻译：在本文中,我们侧重于语义图像合成任务,目的是将语义标签图转换成摄影现实图像; 现有方法缺乏有效的语义限制,无法保存语义信息,忽视空间和频道两个层面的结构相关性,导致模糊性和人工制品易变结果不尽人意; 为解决这些局限性,我们提出一个新的双关注GAN(DAGAN),以综合来自输入布局的相光现实和语义一致的图像,同时不强加额外的培训管理或修改现有方法的网络结构。我们还提议两个新型模块,即定位为空间关注模块(SAM)和比例为频道关注模块(CAM),以分别在空间和频道两个层面捕捉语义结构的注意。具体地说,SAM有选择地将每个位置的像素通过空间关注地图(DAN)连接起来,导致相像义标签与彼此关联的语义模型(无论空间距离如何。与此同时,CAM有选择地强调每个频道关注地图的大小特征,我们在SAMA 4 中更精确地展示了Slaimal输出。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【ACM MM2020】对偶注意力GAN语义图像合成

专知会员服务

36+阅读 · 2020年9月2日