如您所愿:精细控制带有摘要场景图的图像编导 (Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs) - 专知论文

会员服务 ·

0

图像字幕 · 图 · 控制器 · 多样性 · MoDELS ·

2020 年 3 月 1 日

Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

翻译：如您所愿:精细控制带有摘要场景图的图像编导

Shizhe Chen,Qin Jin,Peng Wang,Qi Wu

from arxiv, To be appeared in CVPR 2020

Humans are able to describe image contents with coarse to fine details as they wish. However, most image captioning models are intention-agnostic which can not generate diverse descriptions according to different user intentions initiatively. In this work, we propose the Abstract Scene Graph (ASG) structure to represent user intention in fine-grained level and control what and how detailed the generated description should be. The ASG is a directed graph consisting of three types of \textbf{abstract nodes} (object, attribute, relationship) grounded in the image without any concrete semantic labels. Thus it is easy to obtain either manually or automatically. From the ASG, we propose a novel ASG2Caption model, which is able to recognise user intentions and semantics in the graph, and therefore generate desired captions according to the graph structure. Our model achieves better controllability conditioning on ASGs than carefully designed baselines on both VisualGenome and MSCOCO datasets. It also significantly improves the caption diversity via automatically sampling diverse ASGs as control signals.

翻译：人类可以随意用粗略和细细的细节描述图像内容。然而, 多数图像字幕模型都是意图的不可知性模型, 无法根据不同用户的主动意图生成不同的描述。在此工作中, 我们提议了抽象场景图( ASG) 结构, 以细微的分层显示用户的意图, 并控制生成的描述应该如何详细。 ASG 是一张定向图, 由三种类型的 \ textb{ abtract nodes} ( 对象、属性、关系) 组成, 以图像为基础, 没有任何具体的语义标签。因此很容易手动或自动获取。我们从 ASG 中提出一个新的 ASG2 Caption 模型, 能够识别图形中的用户意图和语义, 从而根据图形结构生成想要的文字。我们的模型在ASGs上实现更好的可控性调控性, 而不是在VevisGenome 和 MSCOCO 数据集上精心设计的基线。它还通过自动取样不同的 ASG 来显著改善字幕的多样性。

6

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

【KDD2020】基于节点-边缘协同解纠缠的可解释深图生成，Interpretable Deep Graph Generation with Node-edge Co-disentanglement

【KDD2020】基于节点-边缘协同解纠缠的可解释深图生成，Interpretable Deep Graph Generation with Node-edge Co-disentanglement

专知会员服务

32+阅读 · 2020年6月11日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【NLP| 推荐文章】从统一文本到文本探讨迁移学习的局限性（Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer）

【NLP| 推荐文章】从统一文本到文本探讨迁移学习的局限性（Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer）

专知会员服务

20+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Arxiv

3+阅读 · 2020年3月17日

Compositional Generalization in Image Captioning

Compositional Generalization in Image Captioning

Arxiv

3+阅读 · 2019年9月16日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Describing like humans: on diversity in image captioning

Arxiv

3+阅读 · 2019年3月28日

Attentive Relational Networks for Mapping Images to Scene Graphs

Arxiv

3+阅读 · 2018年11月26日

Structural Consistency and Controllability for Diverse Colorization

Structural Consistency and Controllability for Diverse Colorization

Arxiv

7+阅读 · 2018年9月6日

Generative Adversarial Image Synthesis with Decision Tree Latent Controller

Arxiv

5+阅读 · 2018年5月27日

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

Arxiv

5+阅读 · 2018年5月18日

Generating Triples with Adversarial Networks for Scene Graph Construction

Arxiv

7+阅读 · 2018年2月7日

VIP会员

文章信息

相关主题

相关VIP内容

【KDD2020】基于节点-边缘协同解纠缠的可解释深图生成，Interpretable Deep Graph Generation with Node-edge Co-disentanglement

【KDD2020】基于节点-边缘协同解纠缠的可解释深图生成，Interpretable Deep Graph Generation with Node-edge Co-disentanglement

专知会员服务

32+阅读 · 2020年6月11日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【NLP| 推荐文章】从统一文本到文本探讨迁移学习的局限性（Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer）

【NLP| 推荐文章】从统一文本到文本探讨迁移学习的局限性（Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer）

专知会员服务

20+阅读 · 2019年11月24日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

条件GAN重大改进！cGANs with Projection Discriminator

条件GAN重大改进！cGANs with Projection Discriminator

CreateAMind

8+阅读 · 2018年2月7日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

gan生成图像at 1024² 的代码论文

gan生成图像at 1024² 的代码论文

CreateAMind

4+阅读 · 2017年10月31日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Arxiv

3+阅读 · 2020年3月17日

Compositional Generalization in Image Captioning

Compositional Generalization in Image Captioning

Arxiv

3+阅读 · 2019年9月16日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Exploring the Semantics for Visual Relationship Detection

Arxiv

3+阅读 · 2019年4月3日

Describing like humans: on diversity in image captioning

Arxiv

3+阅读 · 2019年3月28日

Attentive Relational Networks for Mapping Images to Scene Graphs

Arxiv

3+阅读 · 2018年11月26日

Structural Consistency and Controllability for Diverse Colorization

Structural Consistency and Controllability for Diverse Colorization

Arxiv

7+阅读 · 2018年9月6日

Generative Adversarial Image Synthesis with Decision Tree Latent Controller

Arxiv

5+阅读 · 2018年5月27日

SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text

Arxiv

5+阅读 · 2018年5月18日

Generating Triples with Adversarial Networks for Scene Graph Construction

Arxiv

7+阅读 · 2018年2月7日

微信扫码咨询专知VIP会员