LatteGAN:多发文字有条件图像操纵的视觉引导语言关注 (LatteGAN: Visually Guided Language Attention for Multi-Turn Text-Conditioned Image Manipulation)

Text-guided image manipulation tasks have recently gained attention in the vision-and-language community. While most of the prior studies focused on single-turn manipulation, our goal in this paper is to address the more challenging multi-turn image manipulation (MTIM) task. Previous models for this task successfully generate images iteratively, given a sequence of instructions and a previously generated image. However, this approach suffers from under-generation and a lack of generated quality of the objects that are described in the instructions, which consequently degrades the overall performance. To overcome these problems, we present a novel architecture called a Visually Guided Language Attention GAN (LatteGAN). Here, we address the limitations of the previous approaches by introducing a Visually Guided Language Attention (Latte) module, which extracts fine-grained text representations for the generator, and a Text-Conditioned U-Net discriminator architecture, which discriminates both the global and local representations of fake or real images. Extensive experiments on two distinct MTIM datasets, CoDraw and i-CLEVR, demonstrate the state-of-the-art performance of the proposed model.

翻译：以文字为指南的图像处理任务最近在视觉和语言界引起了注意。虽然先前的大多数研究侧重于单转操纵,但我们在本文中的目标是解决更具挑战性的多转图像处理任务。先前的任务模型根据一系列指令和先前生成的图像,成功地迭代生成图像。但是,这种方法由于设计指令中所描述的物体的生成不足和缺乏生成质量而受到影响,从而降低了总体性能。为了克服这些问题,我们提出了一个称为视觉引导语言注意GAN(LatteGAN)的新颖结构。在这里,我们通过引入视觉引导语言注意模块(Latte)来解决以往方法的局限性,该模块为生成者提取精细的文字表达方式,以及一个限制假图像或真实图像的全球和本地表达方式的文本调整 U-网络歧视结构。在两个不同的MTIM数据集( CoDraw 和i-CLEVR)上进行广泛的实验,展示了拟议模型的状态表现。

相关内容

注意力机制

关注 120

Attention机制最早是在视觉图像领域提出来的，但是真正火起来应该算是google mind团队的这篇论文《Recurrent Models of Visual Attention》[14]，他们在RNN模型上使用了attention机制来进行图像分类。随后，Bahdanau等人在论文《Neural Machine Translation by Jointly Learning to Align and Translate》 [1]中，使用类似attention的机制在机器翻译任务上将翻译和对齐同时进行，他们的工作算是是第一个提出attention机制应用到NLP领域中。接着类似的基于attention机制的RNN模型扩展开始应用到各种NLP任务中。最近，如何在CNN中使用attention机制也成为了大家的研究热点。下图表示了attention研究进展的大概趋势。

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

专知会员服务

7+阅读 · 2022年3月19日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日