愿景-语言培训前展望:基本知识、近期进展和未来趋势 (Vision-Language Pre-training: Basics, Recent Advances, and Future Trends) - 专知论文

会员服务 ·

0

自动问答 · Vision · 图片分类 · state-of-the-art · MoDELS ·

2022 年 10 月 17 日

Vision-Language Pre-training: Basics, Recent Advances, and Future Trends

翻译：愿景-语言培训前展望:基本知识、近期进展和未来趋势

Zhe Gan,Linjie Li,Chunyuan Li,Lijuan Wang,Zicheng Liu,Jianfeng Gao

from arxiv, A survey paper/book on Vision-Language Pre-training (102 pages)

This paper surveys vision-language pre-training (VLP) methods for multimodal intelligence that have been developed in the last few years. We group these approaches into three categories: ($i$) VLP for image-text tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding; ($ii$) VLP for core computer vision tasks, such as (open-set) image classification, object detection, and segmentation; and ($iii$) VLP for video-text tasks, such as video captioning, video-text retrieval, and video question answering. For each category, we present a comprehensive review of state-of-the-art methods, and discuss the progress that has been made and challenges still being faced, using specific systems and models as case studies. In addition, for each category, we discuss advanced topics being actively explored in the research community, such as big foundation models, unified modeling, in-context few-shot learning, knowledge, robustness, and computer vision in the wild, to name a few.

翻译：本文调查过去几年中开发的多式联运情报的视觉语言前培训方法。我们将这些方法分为三类:图像文本任务,如图像字幕、图像文本检索、视觉问答和视觉地面定位等,(美元)VLP;核心计算机视觉任务,如(开放式)图像分类、物体探测和分解等,(二)VLP;视频文本任务,如视频字幕、视频文本检索和视频问题解答,(三)VLP。对于每一类,我们提出对最新方法的全面审查,并讨论已经取得的进展和仍然面临的挑战,使用具体的系统和模型作为案例研究。此外,我们还讨论研究界正在积极探讨的先进课题,如大基础模型、统一模型、文字中的少量学习、知识、坚固性和计算机在野生中的视觉,等等。

28

相关内容

自动问答

自动问答（Question Answering, QA）是指利用计算机自动回答用户所提出的问题以满足用户知识需求的任务。不同于现有搜索引擎，问答系统是信息服务的一种高级形式，系统返回用户的不再是基于关键词匹配排序的文档列表，而是精准的自然语言答案。近年来，随着人工智能的飞速发展，自动问答已经成为倍受关注且发展前景广泛的研究方向。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

36+阅读 · 2022年3月25日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

炎症作用下circ_0007986/miRNA调控食管癌细胞耐药促进肿瘤转移机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

EZH2／STAT3／FoxO1信号通路调控口腔鳞癌EMT相关糖代谢和 “失巢凋亡” 逃逸的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

一种乳腺癌分子特异性手术导航成像方法

国家自然科学基金

1+阅读 · 2015年12月31日

NF-kappaB/miR-23a/GSL1通路在鼻咽癌放疗抵抗中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

PPARγ激动剂增敏ABT-263抗肝细胞癌的作用及分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

miRNA-205介导炎症网络调控乳腺癌转移的机制探索

国家自然科学基金

0+阅读 · 2012年12月31日

miR-370-LIN28A信号通路在肝癌发生发展中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

RI调控ILK信号通路抑制膀胱癌发生EMT及转移的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Hedgehog信号通路调控宫颈癌上皮间质转化的作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

活性氧介导的NSC-741909抗肿瘤作用分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

Arxiv

0+阅读 · 2022年11月18日

Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Arxiv

48+阅读 · 2022年9月7日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

Arxiv

31+阅读 · 2021年11月1日

Advances in Multi-turn Dialogue Comprehension: A Survey

Arxiv

23+阅读 · 2021年10月11日

Recent Advances of Continual Learning in Computer Vision: An Overview

Recent Advances of Continual Learning in Computer Vision: An Overview

Arxiv

22+阅读 · 2021年9月23日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Arxiv

57+阅读 · 2021年5月24日

Recent Advances in Deep Learning-based Dialogue Systems

Arxiv

18+阅读 · 2021年5月10日

A Survey on Dialogue Systems: Recent Advances and New Frontiers

Arxiv

11+阅读 · 2018年1月11日

Multimodal Machine Learning: A Survey and Taxonomy

Arxiv

151+阅读 · 2017年8月1日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

【视觉和语言导航:任务、方法和未来方向的综述】Vision-and-Language Navigation: A Survey of Tasks, Methods, and Future Directions

专知会员服务

36+阅读 · 2022年3月25日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《"半人马"训练计划：美国陆军北方司令部兵棋推演与Scale AI系统集成》最新报告

《飞行自组织网络通信协议评估体系：三维高斯-马尔科夫移动模型的创新升级》172页

地面无人作战平台：现代战争中的机器士兵

《面向边缘智能应用的AI模型优化技术研究》139页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium7

中国图象图形学学会CSIG

0+阅读 · 2021年11月15日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium2

中国图象图形学学会CSIG

0+阅读 · 2021年11月8日

【ICIG2021】Latest News & Announcements of the Plenary Talk2

【ICIG2021】Latest News & Announcements of the Plenary Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年11月2日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

相关论文

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

Arxiv

0+阅读 · 2022年11月18日

Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Arxiv

48+阅读 · 2022年9月7日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey

Arxiv

31+阅读 · 2021年11月1日

Advances in Multi-turn Dialogue Comprehension: A Survey

Arxiv

23+阅读 · 2021年10月11日

Recent Advances of Continual Learning in Computer Vision: An Overview

Recent Advances of Continual Learning in Computer Vision: An Overview

Arxiv

22+阅读 · 2021年9月23日

Recent Advances and Trends in Multimodal Deep Learning: A Review

Arxiv

57+阅读 · 2021年5月24日

Recent Advances in Deep Learning-based Dialogue Systems

Arxiv

18+阅读 · 2021年5月10日

A Survey on Dialogue Systems: Recent Advances and New Frontiers

Arxiv

11+阅读 · 2018年1月11日

Multimodal Machine Learning: A Survey and Taxonomy

Arxiv

151+阅读 · 2017年8月1日

相关基金

炎症作用下circ_0007986/miRNA调控食管癌细胞耐药促进肿瘤转移机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

EZH2／STAT3／FoxO1信号通路调控口腔鳞癌EMT相关糖代谢和 “失巢凋亡” 逃逸的分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

一种乳腺癌分子特异性手术导航成像方法

国家自然科学基金

1+阅读 · 2015年12月31日

NF-kappaB/miR-23a/GSL1通路在鼻咽癌放疗抵抗中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

PPARγ激动剂增敏ABT-263抗肝细胞癌的作用及分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

miRNA-205介导炎症网络调控乳腺癌转移的机制探索

国家自然科学基金

0+阅读 · 2012年12月31日

miR-370-LIN28A信号通路在肝癌发生发展中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

RI调控ILK信号通路抑制膀胱癌发生EMT及转移的分子机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Hedgehog信号通路调控宫颈癌上皮间质转化的作用及机制

国家自然科学基金

0+阅读 · 2011年12月31日

活性氧介导的NSC-741909抗肿瘤作用分子机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员