学习从自然语言监督中生成场景图 (Learning to Generate Scene Graph from Natural Language Supervision) - 专知论文

会员服务 ·

0

图 · 学成 · MoDELS · 可辨认的 · 标注 ·

2021 年 9 月 6 日

Learning to Generate Scene Graph from Natural Language Supervision

翻译：学习从自然语言监督中生成场景图

Yiwu Zhong,Jing Shi,Jianwei Yang,Chenliang Xu,Yin Li

from arxiv, Accepted to ICCV 2021

Learning from image-text data has demonstrated recent success for many recognition tasks, yet is currently limited to visual features or individual visual concepts such as objects. In this paper, we propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph. To bridge the gap between images and texts, we leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph. Further, we design a Transformer-based model to predict these "pseudo" labels via a masked token prediction task. Learning from only image-sentence pairs, our model achieves 30% relative gain over a latest method trained with human-annotated unlocalized scene graphs. Our model also shows strong results for weakly and fully supervised scene graph generation. In addition, we explore an open-vocabulary setting for detecting scene graphs, and present the first result for open-set scene graph generation. Our code is available at https://github.com/YiwuZhong/SGG_from_NLS.

翻译：从图像文本数据中学习的图像文本数据显示,许多识别任务最近取得了成功,但目前仅限于视觉特征或对象等单个视觉概念。在本文中,我们提出了从图像-感官配对中学习的第一批方法之一,从图像-感官配对中提取本地天体及其在图像中的关系的图形表示,称为场景图。为了缩小图像和文本之间的差距,我们利用一个现成天体探测器来识别和定位天体实例,将所检测到的区域标签与从字幕中提取出来的概念相匹配,从而为学习场景图创建“假体”标签。此外,我们设计了一个基于变换器的模型,通过掩码符号预测任务来预测这些“假体”标签。从仅使用图像-感官配对的图像组合中学习,我们的模型在用人类附加说明的未经定位的场景图所训练的最新方法中取得了30%的相对收益。我们的模型还展示了薄弱和完全受监督的场景图生成的有力结果。此外,我们探索了一个用于探测场景图的开放词汇设置,并展示开场图生成的首项结果。我们的代码可在 http://GRSHU/SHRWRMR.我们的代码在。

0

相关内容

自监督学习最新研究进展

自监督学习最新研究进展

专知会员服务

77+阅读 · 2021年3月24日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

46+阅读 · 2020年7月29日

【北卡罗莱纳州立大学】单场景视频异常检测综述，A Survey of Single-Scene Video Anomaly Detection

【北卡罗莱纳州立大学】单场景视频异常检测综述，A Survey of Single-Scene Video Anomaly Detection

专知会员服务

31+阅读 · 2020年4月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

专知会员服务

30+阅读 · 2020年1月2日

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

专知会员服务

12+阅读 · 2019年12月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

MIGS: Meta Image Generation from Scene Graphs

Arxiv

0+阅读 · 2021年10月22日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Classification by Attention: Scene Graph Classification with Prior Knowledge

Arxiv

8+阅读 · 2020年11月19日

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph

Arxiv

11+阅读 · 2020年7月31日

Bridging Knowledge Graphs to Generate Scene Graphs

Bridging Knowledge Graphs to Generate Scene Graphs

Arxiv

5+阅读 · 2020年1月7日

Text-to-Image Synthesis Based on Machine Generated Captions

Text-to-Image Synthesis Based on Machine Generated Captions

Arxiv

3+阅读 · 2019年10月9日

Knowledge-Embedded Routing Network for Scene Graph Generation

Arxiv

5+阅读 · 2019年3月8日

Using Scene Graph Context to Improve Image Generation

Using Scene Graph Context to Improve Image Generation

Arxiv

3+阅读 · 2019年1月15日

Attentive Relational Networks for Mapping Images to Scene Graphs

Arxiv

3+阅读 · 2018年11月26日

Generating Triples with Adversarial Networks for Scene Graph Construction

Arxiv

7+阅读 · 2018年2月7日

VIP会员

文章信息

相关主题

相关VIP内容

自监督学习最新研究进展

自监督学习最新研究进展

专知会员服务

77+阅读 · 2021年3月24日

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

【UIUC】最新《自监督学习》教程，51页ppt，Self-supervised learning

专知会员服务

84+阅读 · 2020年11月25日

【Facebook AI】无监督机器翻译，336页ppt，Unsupervised Machine Translation

专知会员服务

19+阅读 · 2020年11月17日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

46+阅读 · 2020年7月29日

【北卡罗莱纳州立大学】单场景视频异常检测综述，A Survey of Single-Scene Video Anomaly Detection

【北卡罗莱纳州立大学】单场景视频异常检测综述，A Survey of Single-Scene Video Anomaly Detection

专知会员服务

31+阅读 · 2020年4月13日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

专知会员服务

30+阅读 · 2020年1月2日

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

【Google】视频诱导视觉不变性的自监督学习（Self-Supervised Learning of Video-Induced Visual Invariances），谷歌博士后研究员| Michael Tschannen等

专知会员服务

12+阅读 · 2019年12月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《美空军条令出版物：战略打击》最新条令

《高能激光武器》22页slides

军事前沿模型

《面向小型无人机或无人飞行器的创新雷达探测与人工智能分类技术》263页

相关资讯

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

强化学习族谱

强化学习族谱

CreateAMind

26+阅读 · 2017年8月2日

相关论文

MIGS: Meta Image Generation from Scene Graphs

Arxiv

0+阅读 · 2021年10月22日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

Classification by Attention: Scene Graph Classification with Prior Knowledge

Arxiv

8+阅读 · 2020年11月19日

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph

Arxiv

11+阅读 · 2020年7月31日

Bridging Knowledge Graphs to Generate Scene Graphs

Bridging Knowledge Graphs to Generate Scene Graphs

Arxiv

5+阅读 · 2020年1月7日

Text-to-Image Synthesis Based on Machine Generated Captions

Text-to-Image Synthesis Based on Machine Generated Captions

Arxiv

3+阅读 · 2019年10月9日

Knowledge-Embedded Routing Network for Scene Graph Generation

Arxiv

5+阅读 · 2019年3月8日

Using Scene Graph Context to Improve Image Generation

Using Scene Graph Context to Improve Image Generation

Arxiv

3+阅读 · 2019年1月15日

Attentive Relational Networks for Mapping Images to Scene Graphs

Arxiv

3+阅读 · 2018年11月26日

Generating Triples with Adversarial Networks for Scene Graph Construction

Arxiv

7+阅读 · 2018年2月7日

微信扫码咨询专知VIP会员