LatteGAN:多发文字有条件图像操纵的视觉引导语言关注 (LatteGAN: Visually Guided Language Attention for Multi-Turn Text-Conditioned Image Manipulation) - 专知论文

会员服务 ·

0

Attention · Attentive GAN · GaN · 局部式表示/局部式表征 · Extensibility ·

2022 年 6 月 2 日

LatteGAN: Visually Guided Language Attention for Multi-Turn Text-Conditioned Image Manipulation

翻译：LatteGAN:多发文字有条件图像操纵的视觉引导语言关注

Shoya Matsumori,Yuki Abe,Kosuke Shingyouchi,Komei Sugiura,Michita Imai

Text-guided image manipulation tasks have recently gained attention in the vision-and-language community. While most of the prior studies focused on single-turn manipulation, our goal in this paper is to address the more challenging multi-turn image manipulation (MTIM) task. Previous models for this task successfully generate images iteratively, given a sequence of instructions and a previously generated image. However, this approach suffers from under-generation and a lack of generated quality of the objects that are described in the instructions, which consequently degrades the overall performance. To overcome these problems, we present a novel architecture called a Visually Guided Language Attention GAN (LatteGAN). Here, we address the limitations of the previous approaches by introducing a Visually Guided Language Attention (Latte) module, which extracts fine-grained text representations for the generator, and a Text-Conditioned U-Net discriminator architecture, which discriminates both the global and local representations of fake or real images. Extensive experiments on two distinct MTIM datasets, CoDraw and i-CLEVR, demonstrate the state-of-the-art performance of the proposed model.

翻译：以文字为指南的图像处理任务最近在视觉和语言界引起了注意。虽然先前的大多数研究侧重于单转操纵,但我们在本文中的目标是解决更具挑战性的多转图像处理任务。先前的任务模型根据一系列指令和先前生成的图像,成功地迭代生成图像。但是,这种方法由于设计指令中所描述的物体的生成不足和缺乏生成质量而受到影响,从而降低了总体性能。为了克服这些问题,我们提出了一个称为视觉引导语言注意GAN(LatteGAN)的新颖结构。在这里,我们通过引入视觉引导语言注意模块(Latte)来解决以往方法的局限性,该模块为生成者提取精细的文字表达方式,以及一个限制假图像或真实图像的全球和本地表达方式的文本调整 U-网络歧视结构。在两个不同的MTIM数据集( CoDraw 和i-CLEVR)上进行广泛的实验,展示了拟议模型的状态表现。

0

相关内容

Attention

【CVPR 2022】自由风格的文本-人脸合成和操作，AnyFace: Free-style Text-to-Face Synthesis and Manipulation

【CVPR 2022】自由风格的文本-人脸合成和操作，AnyFace: Free-style Text-to-Face Synthesis and Manipulation

专知会员服务

8+阅读 · 2022年3月12日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

YKL-40水平及CHI3L1基因多态性与高血压发病关系的研究

国家自然科学基金

0+阅读 · 2015年12月31日

雷公藤多苷联合小檗碱预防和治疗2型糖尿病肾小管间质病变的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于单台P波与多参数回归的地震参数快速估算研究

国家自然科学基金

0+阅读 · 2013年12月31日

HDPR1-δ-catenin通路在非小细胞肺癌侵袭和凋亡中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

图像语义自动文本描述技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

四氢生物蝶呤在高血压和高同型半胱氨酸对脑卒中协同作用中的研究

国家自然科学基金

0+阅读 · 2012年12月31日

多功能MR分子探针携带rt-PA靶向溶栓治疗的基础研究

国家自然科学基金

0+阅读 · 2011年12月31日

地震作用下考虑衬垫影响的垃圾填埋场动力响应特性及永久变形

国家自然科学基金

0+阅读 · 2011年12月31日

地震条件下土质边坡抗滑桩加固机理与评价

国家自然科学基金

0+阅读 · 2009年12月31日

Data Augmentation for Manipulation

Arxiv

0+阅读 · 2022年7月20日

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

Arxiv

0+阅读 · 2022年7月20日

A Novel Design and Evaluation of a Dactylus-Equipped Quadruped Robot for Mobile Manipulation

Arxiv

0+阅读 · 2022年7月18日

HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation

Arxiv

0+阅读 · 2022年7月18日

DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation

Arxiv

0+阅读 · 2022年7月18日

Real-time End-to-End Video Text Spotter with Contrastive Representation Learning

Arxiv

0+阅读 · 2022年7月18日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Exploring Models and Data for Remote Sensing Image Caption Generation

Arxiv

14+阅读 · 2017年12月21日

VIP会员

文章信息

相关主题

局部式表示/局部式表征

相关VIP内容

【CVPR 2022】自由风格的文本-人脸合成和操作，AnyFace: Free-style Text-to-Face Synthesis and Manipulation

【CVPR 2022】自由风格的文本-人脸合成和操作，AnyFace: Free-style Text-to-Face Synthesis and Manipulation

专知会员服务

8+阅读 · 2022年3月12日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

VCIP 2022 Call for Special Session Proposals

VCIP 2022 Call for Special Session Proposals

CCF多媒体专委会

1+阅读 · 2022年4月1日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

相关论文

Data Augmentation for Manipulation

Arxiv

0+阅读 · 2022年7月20日

Diffsound: Discrete Diffusion Model for Text-to-sound Generation

Arxiv

0+阅读 · 2022年7月20日

A Novel Design and Evaluation of a Dactylus-Equipped Quadruped Robot for Mobile Manipulation

Arxiv

0+阅读 · 2022年7月18日

HiFormer: Hierarchical Multi-scale Representations Using Transformers for Medical Image Segmentation

Arxiv

0+阅读 · 2022年7月18日

DynaST: Dynamic Sparse Transformer for Exemplar-Guided Image Generation

Arxiv

0+阅读 · 2022年7月18日

Real-time End-to-End Video Text Spotter with Contrastive Representation Learning

Arxiv

0+阅读 · 2022年7月18日

Cross-Modal Self-Attention Network for Referring Image Segmentation

Cross-Modal Self-Attention Network for Referring Image Segmentation

Arxiv

18+阅读 · 2019年4月9日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Exploring Models and Data for Remote Sensing Image Caption Generation

Arxiv

14+阅读 · 2017年12月21日

相关基金

YKL-40水平及CHI3L1基因多态性与高血压发病关系的研究

国家自然科学基金

0+阅读 · 2015年12月31日

雷公藤多苷联合小檗碱预防和治疗2型糖尿病肾小管间质病变的作用机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于单台P波与多参数回归的地震参数快速估算研究

国家自然科学基金

0+阅读 · 2013年12月31日

HDPR1-δ-catenin通路在非小细胞肺癌侵袭和凋亡中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

图像语义自动文本描述技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

四氢生物蝶呤在高血压和高同型半胱氨酸对脑卒中协同作用中的研究

国家自然科学基金

0+阅读 · 2012年12月31日

多功能MR分子探针携带rt-PA靶向溶栓治疗的基础研究

国家自然科学基金

0+阅读 · 2011年12月31日

地震作用下考虑衬垫影响的垃圾填埋场动力响应特性及永久变形

国家自然科学基金

0+阅读 · 2011年12月31日

地震条件下土质边坡抗滑桩加固机理与评价

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员