PEVL: 增强定位的愿景语言模型培训前和即时计票 (PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models) - 专知论文

会员服务 ·

0

Prompt · Performer · tuning · MoDELS · 语言模型化 ·

2022 年 5 月 23 日

PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models

翻译：PEVL: 增强定位的愿景语言模型培训前和即时计票

Yuan Yao,Qianyu Chen,Ao Zhang,Wei Ji,Zhiyuan Liu,Tat-Seng Chua,Maosong Sun

Vision-language pre-training (VLP) has shown impressive performance on a wide range of cross-modal tasks, where VLP models without reliance on object detectors are becoming the mainstream due to their superior computation efficiency and competitive performance. However, the removal of object detectors also deprives the capability of VLP models in explicit object modeling, which is essential to various position-sensitive vision-language (VL) tasks, such as referring expression comprehension and visual commonsense reasoning. To address the challenge, we introduce PEVL that enhances the pre-training and prompt tuning of VLP models with explicit object position modeling. Specifically, PEVL reformulates discretized object positions and language in a unified language modeling framework, which facilitates explicit VL alignment during pre-training, and also enables flexible prompt tuning for various downstream tasks. We show that PEVL enables state-of-the-art performance of detector-free VLP models on position-sensitive tasks such as referring expression comprehension and phrase grounding, and also improves the performance on position-insensitive tasks with grounded inputs. We make the data and code for this paper publicly available at https://github.com/thunlp/PEVL.

翻译：视力前培训(VLP)在一系列广泛的跨模式任务上表现出了令人印象深刻的成绩,其中不依赖物体探测器的VLP模型因其计算效率和竞争性性能优异而成为主流;然而,物体探测器的去除还剥夺了VLP模型在明确目标模型方面的能力,而这种模型对于各种对位置敏感的视觉语言任务至关重要,例如参考表达理解和视觉常识推理。为了应对挑战,我们引入了PEVLL,这加强了对具有明确物体定位模型的VLP模型的预先培训和迅速调整。具体地说,PEVL在统一的语文模型框架内重新配置分散的物体位置和语言,这有利于在培训前的明确VL调整,也有利于灵活地迅速调整各种下游任务。我们表明,PEVL能够对定位敏感任务(例如参考表达理解和语调)进行最先进的无探测器VLP模型的性能。我们还用基础投入改进了定位不敏感任务的性能。我们将本文的数据和代码公开提供给 http://VPE/VGI。

0

相关内容

Prompt

【CVPR 2022】利用大规模视频转录推进高分辨率视频语言表示，Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

【CVPR 2022】利用大规模视频转录推进高分辨率视频语言表示，Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

专知会员服务

8+阅读 · 2022年3月12日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

髓源性抑制细胞对乳腺癌细胞“干性”获得的生物学和力学特性调控及其分子机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

GPS-IR监测土壤水分含量的反演模型研究

国家自然科学基金

0+阅读 · 2015年12月31日

电动汽车与可再生能源的时空耦合特性及能量调度和容量配置的协同进化方法

国家自然科学基金

1+阅读 · 2014年12月31日

基于UGC的应急响应决策支持系统关键技术研究

国家自然科学基金

12+阅读 · 2014年12月31日

基于自适应动态规划的非线性系统鲁棒控制与分散镇定

国家自然科学基金

3+阅读 · 2013年12月31日

三峡库区水上突发事件应急资源配置鲁棒优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

IS6基因突变导致青少年特发性脊柱侧凸的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

同型半胱氨酸致动脉粥样硬化中"c-myc/miRNAs/FABP4"交互作用分子网络的构建及潜在干预靶位的研究

国家自然科学基金

0+阅读 · 2012年12月31日

塔式太阳能热电系统的高效仿真与运行优化

国家自然科学基金

0+阅读 · 2011年12月31日

基于自适应动态规划的非线性系统零和微分对策

国家自然科学基金

1+阅读 · 2009年12月31日

ELLE: Efficient Lifelong Pre-training for Emerging Data

Arxiv

0+阅读 · 2022年7月11日

Efficient Self-supervised Vision Transformers for Representation Learning

Arxiv

0+阅读 · 2022年7月6日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

【CVPR 2022】利用大规模视频转录推进高分辨率视频语言表示，Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

【CVPR 2022】利用大规模视频转录推进高分辨率视频语言表示，Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

专知会员服务

8+阅读 · 2022年3月12日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

20篇「ACL2020」最新论文抢先看！看自然语言处理2020在研究什么？

专知会员服务

97+阅读 · 2020年4月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

最新BERT相关论文清单，BERT-related Papers

最新BERT相关论文清单，BERT-related Papers

专知会员服务

53+阅读 · 2019年9月29日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

IEEE ICKG 2022: Call for Papers

IEEE ICKG 2022: Call for Papers

机器学习与推荐算法

3+阅读 · 2022年3月30日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

ELLE: Efficient Lifelong Pre-training for Emerging Data

Arxiv

0+阅读 · 2022年7月11日

Efficient Self-supervised Vision Transformers for Representation Learning

Arxiv

0+阅读 · 2022年7月6日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Pre-Trained Models: Past, Present and Future

Arxiv

19+阅读 · 2021年6月15日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

Making Pre-trained Language Models Better Few-shot Learners

Arxiv

14+阅读 · 2020年12月31日

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

Arxiv

15+阅读 · 2020年2月28日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

相关基金

髓源性抑制细胞对乳腺癌细胞“干性”获得的生物学和力学特性调控及其分子机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

GPS-IR监测土壤水分含量的反演模型研究

国家自然科学基金

0+阅读 · 2015年12月31日

电动汽车与可再生能源的时空耦合特性及能量调度和容量配置的协同进化方法

国家自然科学基金

1+阅读 · 2014年12月31日

基于UGC的应急响应决策支持系统关键技术研究

国家自然科学基金

12+阅读 · 2014年12月31日

基于自适应动态规划的非线性系统鲁棒控制与分散镇定

国家自然科学基金

3+阅读 · 2013年12月31日

三峡库区水上突发事件应急资源配置鲁棒优化研究

国家自然科学基金

0+阅读 · 2012年12月31日

IS6基因突变导致青少年特发性脊柱侧凸的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

同型半胱氨酸致动脉粥样硬化中"c-myc/miRNAs/FABP4"交互作用分子网络的构建及潜在干预靶位的研究

国家自然科学基金

0+阅读 · 2012年12月31日

塔式太阳能热电系统的高效仿真与运行优化

国家自然科学基金

0+阅读 · 2011年12月31日

基于自适应动态规划的非线性系统零和微分对策

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员