视觉词汇:图像段落说明的愿景和语言基础的桥梁 (Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning) - 专知论文

会员服务 ·

0

CLUES · Vision · MoDELS · 语言模型化 · INFORMS ·

2022 年 9 月 14 日

Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning

翻译：视觉词汇:图像段落说明的愿景和语言基础的桥梁

Yujia Xie,Luowei Zhou,Xiyang Dai,Lu Yuan,Nguyen Bach,Ce Liu,Michael Zeng

People say, "A picture is worth a thousand words". Then how can we get the rich information out of the image? We argue that by using visual clues to bridge large pretrained vision foundation models and language models, we can do so without any extra cross-modal training. Thanks to the strong zero-shot capability of foundation models, we start by constructing a rich semantic representation of the image (e.g., image tags, object attributes / locations, captions) as a structured textual prompt, called visual clues, using a vision foundation model. Based on visual clues, we use large language model to produce a series of comprehensive descriptions for the visual content, which is then verified by the vision model again to select the candidate that aligns best with the image. We evaluate the quality of generated descriptions by quantitative and qualitative measurement. The results demonstrate the effectiveness of such a structured semantic representation.

翻译：人们会说,“图片值一千字。”然后我们如何从图像中获取丰富的信息呢?我们争辩说,通过使用视觉线索来连接经过预先训练的大型视觉基础模型和语言模型,我们可以不经过任何额外的跨模式培训就这样做。由于基础模型的强力零射能力,我们首先用一个结构化的文字提示(如图像标记、目标属性/位置、字幕)来构建一个丰富的语义表达方式(如图像标记、目标属性/位置、标题),用视觉基础模型来显示结构化的文字提示,称为视觉线索。根据视觉线索,我们使用大型语言模型来制作一系列视觉内容的全面描述,然后由视觉模型再核实,以选择最适合图像的候选人。我们通过定量和定性测量来评估生成描述的质量。结果显示了结构化语义表达方式的有效性。

0

相关内容

CLUES

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

专知会员服务

68+阅读 · 2022年3月29日

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

专知会员服务

7+阅读 · 2022年3月19日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

专知会员服务

29+阅读 · 2019年10月13日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

肿瘤间充质干细胞通过CCL22影响非小细胞肺癌化疗敏感性的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Cbl家族调控c-Met介导的非小细胞肺癌放疗抵抗机制的研究

国家自然科学基金

1+阅读 · 2014年12月31日

HMGB1-RAGE介导的炎症反应调节脑卒中溶栓治疗过程中血脑屏障破坏与出血转化

国家自然科学基金

0+阅读 · 2013年12月31日

INF-γ通过CIITA调控PPARγ转录机制及其在2型糖尿病中意义的探讨

国家自然科学基金

0+阅读 · 2013年12月31日

荧光-磁双模态纳米载体装载Survivin siRNA 对胶质瘤干细胞增殖的影响及作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于VIP/PKA/AQP5通路的石荠苧总黄酮抗流感病毒作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

DEC1、DEC2对人乳腺癌细胞衰老的调控作用及其作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

上下文感知的Web服务自适应计算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

survivin拮抗细胞衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

地震作用下斜拉桥倒塌模式识别方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning

Arxiv

1+阅读 · 2022年10月24日

COMPASS: A Formal Framework and Aggregate Dataset for Generalized Surgical Procedure Modeling

Arxiv

0+阅读 · 2022年10月22日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Graph Neural Networks for Natural Language Processing: A Survey

Arxiv

36+阅读 · 2021年6月10日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Arxiv

13+阅读 · 2020年8月11日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

VIP会员

文章信息

相关主题

语言模型化

相关VIP内容

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

【决策Transformers 导论】Introducing Decision Transformers on Hugging Face 🤗

专知会员服务

68+阅读 · 2022年3月29日

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

专知会员服务

7+阅读 · 2022年3月19日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

【北京大学】探索提取跨模态信息进行图像caption，Exploring and Distilling Cross-Modal Information for Image Captioning

专知会员服务

54+阅读 · 2020年3月3日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

专知会员服务

29+阅读 · 2019年10月13日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《俄乌战争背景下俄罗斯的战略性海军分析（2022-2025年）》最新100页报告

【斯坦福博士论文】数据、决策与依赖：构建可信人工智能的挑战

人工智能时代背景下的未来海战

接触战中的无人机优势：美军旅级部队面临的小型无人机系统挑战与调整

相关资讯

IEEE TII Call For Papers

IEEE TII Call For Papers

CCF多媒体专委会

3+阅读 · 2022年3月24日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning

Arxiv

1+阅读 · 2022年10月24日

COMPASS: A Formal Framework and Aggregate Dataset for Generalized Surgical Procedure Modeling

Arxiv

0+阅读 · 2022年10月22日

A Survey of Visual Transformers

Arxiv

39+阅读 · 2021年11月11日

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

Arxiv

11+阅读 · 2021年9月3日

Graph Neural Networks for Natural Language Processing: A Survey

Arxiv

36+阅读 · 2021年6月10日

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue

Arxiv

13+阅读 · 2020年8月11日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Arxiv

14+阅读 · 2018年3月14日

相关基金

肿瘤间充质干细胞通过CCL22影响非小细胞肺癌化疗敏感性的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

Cbl家族调控c-Met介导的非小细胞肺癌放疗抵抗机制的研究

国家自然科学基金

1+阅读 · 2014年12月31日

HMGB1-RAGE介导的炎症反应调节脑卒中溶栓治疗过程中血脑屏障破坏与出血转化

国家自然科学基金

0+阅读 · 2013年12月31日

INF-γ通过CIITA调控PPARγ转录机制及其在2型糖尿病中意义的探讨

国家自然科学基金

0+阅读 · 2013年12月31日

荧光-磁双模态纳米载体装载Survivin siRNA 对胶质瘤干细胞增殖的影响及作用机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于VIP/PKA/AQP5通路的石荠苧总黄酮抗流感病毒作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

DEC1、DEC2对人乳腺癌细胞衰老的调控作用及其作用机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

上下文感知的Web服务自适应计算模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

survivin拮抗细胞衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

地震作用下斜拉桥倒塌模式识别方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员