Position-guided Text Prompt for Vision-Language Pre-training - 专知论文

会员服务 ·

0

块 · MoDELS · BLIP · Prompt · 推断 ·

2023 年 6 月 7 日

Position-guided Text Prompt for Vision-Language Pre-training

翻译：暂无翻译

Alex Jinpeng Wang,Pan Zhou,Mike Zheng Shou,Shuicheng Yan

from arxiv, Camera-ready version, code is in https://github.com/sail-sg/ptp

Vision-Language Pre-Training (VLP) has shown promising capabilities to align image and text pairs, facilitating a broad variety of cross-modal learning tasks. However, we observe that VLP models often lack the visual grounding/localization capability which is critical for many downstream tasks such as visual reasoning. In this work, we propose a novel Position-guided Text Prompt (PTP) paradigm to enhance the visual grounding ability of cross-modal models trained with VLP. Specifically, in the VLP phase, PTP divides the image into $N\times N$ blocks, and identifies the objects in each block through the widely used object detector in VLP. It then reformulates the visual grounding task into a fill-in-the-blank problem given a PTP by encouraging the model to predict the objects in the given blocks or regress the blocks of a given object, e.g. filling `P" or ``O" in aPTP ``The block P has a O". This mechanism improves the visual grounding capability of VLP models and thus helps them better handle various downstream tasks. By introducing PTP into several state-of-the-art VLP frameworks, we observe consistently significant improvements across representative cross-modal learning model architectures and several benchmarks, e.g. zero-shot Flickr30K Retrieval (+4.8 in average recall@1) for ViLT \cite{vilt} baseline, and COCO Captioning (+5.3 in CIDEr) for SOTA BLIP \cite{blip} baseline. Moreover, PTP achieves comparable results with object-detector based methods, and much faster inference speed since PTP discards its object detector for inference while the later cannot. Our code and pre-trained weight will be released at \url{https://github.com/sail-sg/ptp}.

翻译：暂无翻译

0

相关内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

长链非编码RNA-HOTAIR在前列腺癌恶性进展及肿瘤干细胞中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

雄性激素受体通过CCRK促进胃癌细胞增殖的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

MALAT1通过下调miR-125b促进膀胱癌进展的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向大规模城市监控视频检索的语义属性研究

国家自然科学基金

1+阅读 · 2013年12月31日

IGFBP-3联合IL-24上调miR-218抑制mTORC2/AKT/FOXO通路活化对前列腺癌恶性进展及血管生成的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义分析的数据库交互技术

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

图像语义自动文本描述技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

Zic1基因启动子高甲基化及其胃癌抑制作用的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

有机半导体超晶格的研究

国家自然科学基金

0+阅读 · 2008年12月31日

Cross-Modal Concept Learning and Inference for Vision-Language Models

Arxiv

0+阅读 · 2023年7月28日

HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models

Arxiv

0+阅读 · 2023年7月26日

Leveraging Large Language Models for Mental Health Prediction via Online Text Data

Arxiv

0+阅读 · 2023年7月26日

Vision-Language Pre-training: Basics, Recent Advances, and Future Trends

Arxiv

28+阅读 · 2022年10月17日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

VLP: A Survey on Vision-Language Pre-training

VLP: A Survey on Vision-Language Pre-training

Arxiv

11+阅读 · 2022年2月21日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

VIP会员

文章信息

相关主题

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

视觉-语言-动作模型解析：从模块构成到里程碑与挑战

《解析陆域作战方向：一个概念性框架》报告

【博士论文】基于多模态基础模型的上下文学习

追寻真正的AI自主性：从遗留思维到战场优势

相关资讯

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

谷歌发表的史上最强NLP模型BERT的官方代码和预训练模型可以下载了

AINLP

12+阅读 · 2018年11月1日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

相关论文

Cross-Modal Concept Learning and Inference for Vision-Language Models

Arxiv

0+阅读 · 2023年7月28日

HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models

Arxiv

0+阅读 · 2023年7月26日

Leveraging Large Language Models for Mental Health Prediction via Online Text Data

Arxiv

0+阅读 · 2023年7月26日

Vision-Language Pre-training: Basics, Recent Advances, and Future Trends

Arxiv

28+阅读 · 2022年10月17日

Prompt Distribution Learning

Arxiv

14+阅读 · 2022年5月6日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

VLP: A Survey on Vision-Language Pre-training

VLP: A Survey on Vision-Language Pre-training

Arxiv

11+阅读 · 2022年2月21日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

相关基金

长链非编码RNA-HOTAIR在前列腺癌恶性进展及肿瘤干细胞中的作用及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

雄性激素受体通过CCRK促进胃癌细胞增殖的分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

MALAT1通过下调miR-125b促进膀胱癌进展的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

面向大规模城市监控视频检索的语义属性研究

国家自然科学基金

1+阅读 · 2013年12月31日

IGFBP-3联合IL-24上调miR-218抑制mTORC2/AKT/FOXO通路活化对前列腺癌恶性进展及血管生成的作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义分析的数据库交互技术

国家自然科学基金

0+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

图像语义自动文本描述技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

Zic1基因启动子高甲基化及其胃癌抑制作用的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

有机半导体超晶格的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员