争取以快速微调方式制作开放词汇场景图 (Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning) - 专知论文

会员服务 ·

0

情景 · 图 · Extensibility · 推断 · 类别 ·

2022 年 9 月 7 日

Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning

翻译：争取以快速微调方式制作开放词汇场景图

Tao He,Lianli Gao,Jingkuan Song,Yuan-Fang Li

Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image. The prevailing SGG methods require all object classes to be given in the training set. Such a closed setting limits the practical application of SGG. In this paper, we introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes but is required to infer relations for unseen target object classes. To this end, we propose a two-step method that firstly pre-trains on large amounts of coarse-grained region-caption data and then leverages two prompt-based techniques to finetune the pre-trained model without updating its parameters. Moreover, our method can support inference over completely unseen object classes, which existing methods are incapable of handling. On extensive experiments on three benchmark datasets, Visual Genome, GQA, and Open-Image, our method significantly outperforms recent, strong SGG methods on the setting of Ov-SGG, as well as on the conventional closed SGG.

翻译：场景图形生成( SGG) 是一项基本任务,旨在检测图像中对象之间的视觉关系。流行的 SGG 方法要求所有对象类别在训练组中提供。这种封闭式设置限制了SGG的实际应用。在本文中,我们引入了开放式词汇场景图形生成,这是一个新颖、现实和具有挑战性的设置,在一组基本对象类别上对模型进行了培训,但需要据此推断看不见目标对象类别的关系。为此,我们提出了一个两步方法,首先在大量粗略区域覆盖数据上进行预演,然后利用两种即时技术在不更新参数的情况下对预培训模型进行微调。此外,我们的方法可以支持对现有方法无法处理的完全看不见的物体类别进行推断。在对三个基准数据集(视觉基因组、GQA和Open-Image)进行的广泛实验中,我们的方法大大超越了在设计Ov- SGG 上以及常规封闭的 SGGG 上最近的强大SGG 方法。

0

相关内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

全局性气动外形优化中的流场加速求解新方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

我国梨野生种质的遗传多样性及谱系地理学研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于WorldView-3和OP-ELM的矿化蚀变提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

冰川水文水化学过程研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于氯通道差异性表达的Disulfiram-Cu靶向抗肿瘤作用及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

玉米大斑病菌水甘油通道蛋白StFps1基因的克隆与功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

MRTF-A调控CYR61介导间充质干细胞向内皮细胞分化的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

荞麦属植物种子蛋白亚基基因序列及其在染色体上的分布规律研究

国家自然科学基金

0+阅读 · 2011年12月31日

桃果实中脱落酸受体ABAR的功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

Prompting through Prototype: A Prototype-based Prompt Learning on Pretrained Vision-Language Models

Arxiv

0+阅读 · 2022年10月19日

Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs

Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs

Arxiv

0+阅读 · 2022年10月19日

Towards Accurate Subgraph Similarity Computation via Neural Graph Pruning

Arxiv

0+阅读 · 2022年10月19日

Towards Efficient and Effective Self-Supervised Learning of Visual Representations

Arxiv

0+阅读 · 2022年10月18日

DisCup: Discriminator Cooperative Unlikelihood Prompt-tuning for Controllable Text Generation

Arxiv

0+阅读 · 2022年10月18日

RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses

Arxiv

0+阅读 · 2022年10月12日

Scene Graph Generation: A Comprehensive Survey

Arxiv

26+阅读 · 2022年1月3日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【深度学习表格检测、信息提取和结构化】《Table Detection, Information Extraction and Structuring using Deep Learning》by Vihar Kurama

专知会员服务

38+阅读 · 2020年1月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium5

中国图象图形学学会CSIG

1+阅读 · 2021年11月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Prompting through Prototype: A Prototype-based Prompt Learning on Pretrained Vision-Language Models

Arxiv

0+阅读 · 2022年10月19日

Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs

Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs

Arxiv

0+阅读 · 2022年10月19日

Towards Accurate Subgraph Similarity Computation via Neural Graph Pruning

Arxiv

0+阅读 · 2022年10月19日

Towards Efficient and Effective Self-Supervised Learning of Visual Representations

Arxiv

0+阅读 · 2022年10月18日

DisCup: Discriminator Cooperative Unlikelihood Prompt-tuning for Controllable Text Generation

Arxiv

0+阅读 · 2022年10月18日

RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses

Arxiv

0+阅读 · 2022年10月12日

Scene Graph Generation: A Comprehensive Survey

Arxiv

26+阅读 · 2022年1月3日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

相关基金

全局性气动外形优化中的流场加速求解新方法研究

国家自然科学基金

0+阅读 · 2015年12月31日

我国梨野生种质的遗传多样性及谱系地理学研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于WorldView-3和OP-ELM的矿化蚀变提取方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

冰川水文水化学过程研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于氯通道差异性表达的Disulfiram-Cu靶向抗肿瘤作用及其机制

国家自然科学基金

0+阅读 · 2012年12月31日

玉米大斑病菌水甘油通道蛋白StFps1基因的克隆与功能研究

国家自然科学基金

0+阅读 · 2011年12月31日

MRTF-A调控CYR61介导间充质干细胞向内皮细胞分化的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

荞麦属植物种子蛋白亚基基因序列及其在染色体上的分布规律研究

国家自然科学基金

0+阅读 · 2011年12月31日

桃果实中脱落酸受体ABAR的功能研究

国家自然科学基金

0+阅读 · 2009年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员