CLIP 对红色圆圈的认知：针对视觉提示工程的 VLMs (What does CLIP know about a red circle? Visual prompt engineering for VLMs) - 专知论文

会员服务 ·

0

提示工程 · 零样本 · 图像空间 · 指称 · 点定位 ·

2023 年 4 月 13 日

What does CLIP know about a red circle? Visual prompt engineering for VLMs

翻译：CLIP 对红色圆圈的认知：针对视觉提示工程的 VLMs

Aleksandar Shtedritski,Christian Rupprecht,Andrea Vedaldi

Large-scale Vision-Language Models, such as CLIP, learn powerful image-text representations that have found numerous applications, from zero-shot classification to text-to-image generation. Despite that, their capabilities for solving novel discriminative tasks via prompting fall behind those of large language models, such as GPT-3. Here we explore the idea of visual prompt engineering for solving computer vision tasks beyond classification by editing in image space instead of text. In particular, we discover an emergent ability of CLIP, where, by simply drawing a red circle around an object, we can direct the model's attention to that region, while also maintaining global information. We show the power of this simple approach by achieving state-of-the-art in zero-shot referring expressions comprehension and strong performance in keypoint localization tasks. Finally, we draw attention to some potential ethical concerns of large language-vision models.

翻译：大规模视觉语言模型，例如 CLIP，学习到了强大的图像文本表示，并已经被用于许多应用，从零样本分类到文本到图像的生成。尽管如此，它们通过提示解决新颖的判别式任务的能力仍然落后于大型语言模型，例如 GPT-3。本文探讨了通过编辑图像空间而不是文本来解决计算机视觉任务的视觉提示工程的想法。特别地，我们发现 CLIP 的一种新兴能力，在对象周围简单地画一个红色圆圈，我们就可以将模型的注意力引导到该区域，同时仍然保持全局信息。我们通过在零样本指称表达理解和关键点定位任务中实现先进性能，展示了这种简单方法的强大威力。最后，我们引起了一些关于大型语言-视觉模型可能引起的道德问题的注意。

0

相关内容

提示工程

大模型全面阐述，448页新书《基础模型自然语言处理》，详述大模型在信息提取文本生成视觉语音应用

大模型全面阐述，448页新书《基础模型自然语言处理》，详述大模型在信息提取文本生成视觉语音应用

专知会员服务

180+阅读 · 2023年5月27日

CVPR 2023 | Prophet: 用小模型启发大语言模型解决外部知识图像问答

CVPR 2023 | Prophet: 用小模型启发大语言模型解决外部知识图像问答

专知会员服务

54+阅读 · 2023年4月1日

CVPR 2023｜打破CAM的局限性！ToCo：进一步激发 ViT 在弱监督语义分割的潜力

CVPR 2023｜打破CAM的局限性！ToCo：进一步激发 ViT 在弱监督语义分割的潜力

专知会员服务

20+阅读 · 2023年3月31日

【AAAI2023】用单塔Transformer统一视觉语言表示空间

【AAAI2023】用单塔Transformer统一视觉语言表示空间

专知会员服务

16+阅读 · 2022年11月27日

视觉语言如何协同学习？港科大等最新《视觉语言智能》综述论文，全面阐述VL的任务、表示学习和大模型

视觉语言如何协同学习？港科大等最新《视觉语言智能》综述论文，全面阐述VL的任务、表示学习和大模型

专知会员服务

52+阅读 · 2022年3月10日

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

专知会员服务

44+阅读 · 2022年3月8日

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

专知会员服务

29+阅读 · 2022年3月6日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

新智元

5+阅读 · 2022年10月29日

港科大&MSRA新研究：关于图像到图像转换，Fine-tuning is all you need

港科大&MSRA新研究：关于图像到图像转换，Fine-tuning is all you need

PaperWeekly

0+阅读 · 2022年7月5日

论文小综 | Using External Knowledge on VQA

论文小综 | Using External Knowledge on VQA

开放知识图谱

10+阅读 · 2020年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

泡泡机器人SLAM

23+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

PSD93在APP/PS1小鼠突触可塑性中的病理作用及其机制

国家自然科学基金

0+阅读 · 2014年12月31日

miR-143-3p和miR-195-5p低表达在结直肠癌肝转移中的作用与调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

HOXB-AS3/HOXB7/PAK4信号轴调控结直肠癌侵袭转移的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

BER通路基因miRNA结合位点基因多态性与结直肠癌易感性的关联及功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向调控LASP1基因miRNAs分子的鉴定及其在结直肠癌转移中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

FMNL3基因在结直肠癌转移中的作用及其信号转导通路

国家自然科学基金

0+阅读 · 2012年12月31日

长链非编码RNA HOTAIRM1在结直肠癌中的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

结直肠癌中TOP2A与ZNF148的竞争性内源性RNA调控机制和功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

survivin拮抗细胞衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods

Arxiv

0+阅读 · 2023年5月30日

Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge

Arxiv

0+阅读 · 2023年5月30日

Exploring Effectiveness of GPT-3 in Grammatical Error Correction: A Study on Performance and Controllability in Prompt-Based Methods

Arxiv

0+阅读 · 2023年5月29日

Do Large Language Models Know What They Don't Know?

Arxiv

0+阅读 · 2023年5月29日

ChatGPT4PCG Competition: Character-like Level Generation for Science Birds

Arxiv

0+阅读 · 2023年5月29日

Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information

Arxiv

0+阅读 · 2023年5月28日

Instance-Aware Image Completion

Arxiv

0+阅读 · 2023年5月26日

Visual Information Matters for ASR Error Correction

Arxiv

0+阅读 · 2023年5月26日

Self-Edit: Fault-Aware Code Editor for Code Generation

Arxiv

0+阅读 · 2023年5月26日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

VIP会员

文章信息

相关主题

相关VIP内容

大模型全面阐述，448页新书《基础模型自然语言处理》，详述大模型在信息提取文本生成视觉语音应用

大模型全面阐述，448页新书《基础模型自然语言处理》，详述大模型在信息提取文本生成视觉语音应用

专知会员服务

180+阅读 · 2023年5月27日

CVPR 2023 | Prophet: 用小模型启发大语言模型解决外部知识图像问答

CVPR 2023 | Prophet: 用小模型启发大语言模型解决外部知识图像问答

专知会员服务

54+阅读 · 2023年4月1日

CVPR 2023｜打破CAM的局限性！ToCo：进一步激发 ViT 在弱监督语义分割的潜力

CVPR 2023｜打破CAM的局限性！ToCo：进一步激发 ViT 在弱监督语义分割的潜力

专知会员服务

20+阅读 · 2023年3月31日

【AAAI2023】用单塔Transformer统一视觉语言表示空间

【AAAI2023】用单塔Transformer统一视觉语言表示空间

专知会员服务

16+阅读 · 2022年11月27日

视觉语言如何协同学习？港科大等最新《视觉语言智能》综述论文，全面阐述VL的任务、表示学习和大模型

视觉语言如何协同学习？港科大等最新《视觉语言智能》综述论文，全面阐述VL的任务、表示学习和大模型

专知会员服务

52+阅读 · 2022年3月10日

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

专知会员服务

44+阅读 · 2022年3月8日

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

【CVPR 2022】【视频检索用多模态融合Transformer】Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

专知会员服务

29+阅读 · 2022年3月6日

自然语言处理中的注意力机制，Attention in Natural Language Processing

自然语言处理中的注意力机制，Attention in Natural Language Processing

专知会员服务

136+阅读 · 2020年5月30日

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

2019年自然语言处理NLP亮点总结，29页pdf，NLP Year in Review — 2019 NLP highlights for the year 2019.

专知会员服务

69+阅读 · 2020年1月2日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

不可错过！普林斯顿陈丹琦最新《大语言模型理解》2022课程！全面讲述BERT、GPT、T5等大模型，附Slides

新智元

5+阅读 · 2022年10月29日

港科大&MSRA新研究：关于图像到图像转换，Fine-tuning is all you need

港科大&MSRA新研究：关于图像到图像转换，Fine-tuning is all you need

PaperWeekly

0+阅读 · 2022年7月5日

论文小综 | Using External Knowledge on VQA

论文小综 | Using External Knowledge on VQA

开放知识图谱

10+阅读 · 2020年10月18日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

泡泡机器人SLAM

23+阅读 · 2019年1月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

【论文推荐】最新七篇视觉问答（VQA）相关论文—差别注意力机制、视觉问题推理、视觉对话、数据可视化、记忆增强网络、显式推理

专知

17+阅读 · 2018年4月19日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

【论文推荐】最新六篇视觉问答（VQA）相关论文—盲人问题、物体计数、多模态解释、视觉关系、对抗性网络、对偶循环注意力

专知

32+阅读 · 2018年2月28日

相关论文

Saliency Map Verbalization: Comparing Feature Importance Representations from Model-free and Instruction-based Methods

Arxiv

0+阅读 · 2023年5月30日

Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge

Arxiv

0+阅读 · 2023年5月30日

Exploring Effectiveness of GPT-3 in Grammatical Error Correction: A Study on Performance and Controllability in Prompt-Based Methods

Arxiv

0+阅读 · 2023年5月29日

Do Large Language Models Know What They Don't Know?

Arxiv

0+阅读 · 2023年5月29日

ChatGPT4PCG Competition: Character-like Level Generation for Science Birds

Arxiv

0+阅读 · 2023年5月29日

Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information

Arxiv

0+阅读 · 2023年5月28日

Instance-Aware Image Completion

Arxiv

0+阅读 · 2023年5月26日

Visual Information Matters for ASR Error Correction

Arxiv

0+阅读 · 2023年5月26日

Self-Edit: Fault-Aware Code Editor for Code Generation

Arxiv

0+阅读 · 2023年5月26日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

相关基金

SIRT1介导的Resveratrol对糖尿病视网膜病变“代谢记忆”的作用及其机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

PSD93在APP/PS1小鼠突触可塑性中的病理作用及其机制

国家自然科学基金

0+阅读 · 2014年12月31日

miR-143-3p和miR-195-5p低表达在结直肠癌肝转移中的作用与调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

HOXB-AS3/HOXB7/PAK4信号轴调控结直肠癌侵袭转移的分子机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

BER通路基因miRNA结合位点基因多态性与结直肠癌易感性的关联及功能研究

国家自然科学基金

0+阅读 · 2013年12月31日

靶向调控LASP1基因miRNAs分子的鉴定及其在结直肠癌转移中的作用机制

国家自然科学基金

0+阅读 · 2012年12月31日

FMNL3基因在结直肠癌转移中的作用及其信号转导通路

国家自然科学基金

0+阅读 · 2012年12月31日

长链非编码RNA HOTAIRM1在结直肠癌中的功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

结直肠癌中TOP2A与ZNF148的竞争性内源性RNA调控机制和功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

survivin拮抗细胞衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员