CAGAN: 以联合关注方式生成文本到图像 (CAGAN: Text-To-Image Generation with Combined Attention GANs)

Generating images according to natural language descriptions is a challenging task. In this work, we propose the Combined Attention Generative Adversarial Network (CAGAN) to generate photo-realistic images according to textual descriptions. The proposed CAGAN utilises two attention models: word attention to draw different sub-regions conditioned on related words; and squeeze-and-excitation attention to capture non-linear interaction among channels. With spectral normalisation to stabilise training, our proposed CAGAN improves the state of the art on the IS and FID on the CUB dataset and the FID on the more challenging COCO dataset. Furthermore, we demonstrate that judging a model by a single evaluation metric can be misleading by developing an additional model adding local self-attention which scores a higher IS, outperforming the state of the art on the CUB dataset, but generates unrealistic images through feature repetition.

翻译：根据自然语言描述生成图像是一项具有挑战性的任务。在这项工作中,我们提议合并关注生成反反向网络(CAGAN)根据文字描述生成摄影现实图像。拟议的CAGAN使用两种关注模式:以文字关注吸引不同次区域以相关文字为条件的文字关注;用挤压和刺激关注捕捉各频道之间的非线性互动。随着光谱标准化以稳定培训,我们提议的CAGAN改进了IS和FID在CUB数据集方面的先进水平,FID在更具挑战性的COCO数据集方面的先进水平。此外,我们证明,用单一评价指标来判断一个模型,如果再开发一个模型,加上一个比CUB数据集高的本地自我关注度,比CUB数据集的先进程度高,但通过特征重复产生不切实际的图像,则会产生误导作用。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

生成对抗网络GAN在各领域应用研究进展(中文版)，37页pdf

专知会员服务

151+阅读 · 2020年12月30日

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

【ACM MM2020】对偶注意力GAN语义图像合成

专知会员服务

36+阅读 · 2020年9月2日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日