用于更好图像描述的场景图生成吗? (Scene Graph Generation for Better Image Captioning?) - 专知论文

会员服务 ·

0

图像字幕 · 图 · Better · MoDELS · 可辨认的 ·

2021 年 9 月 23 日

Scene Graph Generation for Better Image Captioning?

翻译：用于更好图像描述的场景图生成吗?

Maximilian Mozes,Martin Schmitt,Vladimir Golkov,Hinrich Schütze,Daniel Cremers

from arxiv, Technical report. This work was done and the paper was written in 2019

We investigate the incorporation of visual relationships into the task of supervised image caption generation by proposing a model that leverages detected objects and auto-generated visual relationships to describe images in natural language. To do so, we first generate a scene graph from raw image pixels by identifying individual objects and visual relationships between them. This scene graph then serves as input to our graph-to-text model, which generates the final caption. In contrast to previous approaches, our model thus explicitly models the detection of objects and visual relationships in the image. For our experiments we construct a new dataset from the intersection of Visual Genome and MS COCO, consisting of images with both a corresponding gold scene graph and human-authored caption. Our results show that our methods outperform existing state-of-the-art end-to-end models that generate image descriptions directly from raw input pixels when compared in terms of the BLEU and METEOR evaluation metrics.

翻译：我们研究将视觉关系纳入监督图像字幕生成的任务,方法是提出一种模型,利用已检测到的天体和自动生成的视觉关系来用自然语言描述图像。为此,我们首先通过识别原始图像像素来生成一个场景图, 识别单个天体和它们之间的视觉关系。这个场景图然后作为图形到文字模型的输入, 生成最终字幕。与以往的方法不同, 我们的模型因此明确地模拟图像中天体的探测和视觉关系。对于我们的实验, 我们从视觉基因组和 MS COCO 的交叉点构建了一个新的数据集, 由图像组成, 包括一个相应的金色场景图和人类授权的字幕。我们的结果显示, 我们的方法超越了现有的最先进的端到端模型, 这些模型直接从原始输入像素中生成图像描述, 用 BLEU 和 METEOR 评估度量来比较。

0

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

最新《图像描述Image Captioning》综述论文，22页pdf220篇文献

专知会员服务

43+阅读 · 2021年7月17日

如何做好一场报告？斯坦福Kayvon教授《清晰报告指南》为您讲解，附69页ppt

如何做好一场报告？斯坦福Kayvon教授《清晰报告指南》为您讲解，附69页ppt

专知会员服务

51+阅读 · 2021年5月27日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【AAAI2020知识图谱论文概述】Knowledge Graphs @ AAAI 2020

【AAAI2020知识图谱论文概述】Knowledge Graphs @ AAAI 2020

专知会员服务

134+阅读 · 2020年2月13日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Question-controlled Text-aware Image Captioning

Arxiv

10+阅读 · 2021年8月4日

Hierarchy Parsing for Image Captioning

Hierarchy Parsing for Image Captioning

Arxiv

6+阅读 · 2019年9月10日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Knowledge-Embedded Routing Network for Scene Graph Generation

Arxiv

5+阅读 · 2019年3月8日

Using Scene Graph Context to Improve Image Generation

Using Scene Graph Context to Improve Image Generation

Arxiv

3+阅读 · 2019年1月15日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Attentive Relational Networks for Mapping Images to Scene Graphs

Arxiv

3+阅读 · 2018年11月26日

Entity-aware Image Caption Generation

Arxiv

4+阅读 · 2018年11月7日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

VIP会员

文章信息

相关主题

相关VIP内容

最新《图像描述Image Captioning》综述论文，22页pdf220篇文献

专知会员服务

43+阅读 · 2021年7月17日

如何做好一场报告？斯坦福Kayvon教授《清晰报告指南》为您讲解，附69页ppt

如何做好一场报告？斯坦福Kayvon教授《清晰报告指南》为您讲解，附69页ppt

专知会员服务

51+阅读 · 2021年5月27日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

【AAAI2020知识图谱论文概述】Knowledge Graphs @ AAAI 2020

【AAAI2020知识图谱论文概述】Knowledge Graphs @ AAAI 2020

专知会员服务

134+阅读 · 2020年2月13日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《复杂工程系统模型驱动设计决策支持系统：早期设计阶段挑战》最新138页

《日本陆上自卫队2040年作战方式与未来作战研究》最新23页slides

人工智能作为战争武器

《后勤保障》最新23页

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

自适应注意力机制在Image Caption中的应用

自适应注意力机制在Image Caption中的应用

PaperWeekly

10+阅读 · 2018年5月10日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

【论文推荐】最新5篇图像分割（Image Segmentation）相关论文—多重假设、超像素分割、自监督、图、生成对抗网络

专知

27+阅读 · 2018年2月7日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Question-controlled Text-aware Image Captioning

Arxiv

10+阅读 · 2021年8月4日

Hierarchy Parsing for Image Captioning

Hierarchy Parsing for Image Captioning

Arxiv

6+阅读 · 2019年9月10日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Knowledge-Embedded Routing Network for Scene Graph Generation

Arxiv

5+阅读 · 2019年3月8日

Using Scene Graph Context to Improve Image Generation

Using Scene Graph Context to Improve Image Generation

Arxiv

3+阅读 · 2019年1月15日

Unsupervised Image Captioning

Arxiv

7+阅读 · 2018年11月27日

Attentive Relational Networks for Mapping Images to Scene Graphs

Arxiv

3+阅读 · 2018年11月26日

Entity-aware Image Caption Generation

Arxiv

4+阅读 · 2018年11月7日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Joint Image Captioning and Question Answering

Arxiv

6+阅读 · 2018年5月22日

微信扫码咨询专知VIP会员