通过反向培训的文本到图像合成 (Object-driven Text-to-Image Synthesis via Adversarial Training)

In this paper, we propose Object-driven Attentive Generative Adversarial Newtorks (Obj-GANs) that allow object-centered text-to-image synthesis for complex scenes. Following the two-step (layout-image) generation process, a novel object-driven attentive image generator is proposed to synthesize salient objects by paying attention to the most relevant words in the text description and the pre-generated semantic layout. In addition, a new Fast R-CNN based object-wise discriminator is proposed to provide rich object-wise discrimination signals on whether the synthesized object matches the text description and the pre-generated layout. The proposed Obj-GAN significantly outperforms the previous state of the art in various metrics on the large-scale COCO benchmark, increasing the Inception score by 27% and decreasing the FID score by 11%. A thorough comparison between the traditional grid attention and the new object-driven attention is provided through analyzing their mechanisms and visualizing their attention layers, showing insights of how the proposed model generates complex scenes in high quality.

翻译：在本文中,我们提议由物体驱动的加速生成自动生成式Newtork (Obj-GANs), 允许对复杂场景进行以物体为中心的文本到图像合成。在两步(外观-图像)生成过程之后, 一个由物体驱动的新的关注图像生成器建议通过注意文字描述和预先生成的语义布局中最相关的词来合成突出的物体。此外, 提议一个新的快速 R-CNN 的基于对象的区分器(Obj-GANs), 以提供关于合成对象是否与文本描述和预生成的布局相匹配的丰富的对象- 歧视信号。拟议的 Obj- GAN 明显地超越了大型COCO基准中各种指标中以往的艺术状态, 将感知值提高27%, 并将FID 评分降低11% 。通过分析传统电网的注意与新对象驱动的注意度进行彻底的比较, 分析其机制并直观它们的注意层, 显示对拟议模型如何产生高质量复杂场景象的洞察力。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

专知会员服务

26+阅读 · 2020年5月5日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日