COLA: How to adapt vision-language models to Compose Objects Localized with Attributes? - 专知论文

会员服务 ·

0

MoDELS · 优化器 · FLAVA · 单峰值 · Performer ·

2023 年 5 月 5 日

COLA: How to adapt vision-language models to Compose Objects Localized with Attributes?

翻译：暂无翻译

Arijit Ray,Filip Radenovic,Abhimanyu Dubey,Bryan A. Plummer,Ranjay Krishna,Kate Saenko

Compositional reasoning is a hallmark of human visual intelligence; yet despite the size of large vision-language models, they struggle to represent simple compositions by combining objects with their attributes. To measure this lack of compositional capability, we design Cola, a text-to-image retrieval benchmark to Compose Objects Localized with Attributes. Using Cola as a testbed, we explore modeling designs to adapt pre-trained vision-language models to reason compositionally about multiple attributes attached to multiple objects. We explore 6 finetuning strategies on 2 seminal vision-language models, using 3 finetuning datasets and 2 test benchmarks (Cola and CREPE). Surprisingly, our optimal finetuning strategy improves a 151M parameter CLIP, which disjointly encodes image and language during pretraining, to perform as well as a 241M parameter FLAVA, which uses a multi-modal transformer encoder during pretraining to attend over both vision and language modalities. This optimal finetuning strategy is a lightweight multi-modal adapter that jointly attends over both image and language features generated by the pretrained model. We show this works better than common strategies such as prompt/fine-tuning, or tuning a comparable number of unimodal layers.

翻译：暂无翻译

0

相关内容

MoDELS

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

基于微裂纹分形特征的超高温陶瓷热冲击性能表征方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

非线性离散可积方程与离散Painlevé方程族的连续极限理论

国家自然科学基金

0+阅读 · 2013年12月31日

发光二极管LED非相干宽带腔增强吸收光谱技术对大气HONO的定量方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

单晶基体界面反应及其对微细无铅焊点可靠性的影响

国家自然科学基金

0+阅读 · 2011年12月31日

胰腺星形细胞对胰腺癌化疗耐药的影响及其机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Adiponectin在肝脏缺血再灌注损伤中的抗肝细胞凋亡机制

国家自然科学基金

0+阅读 · 2009年12月31日

基于多层膜结构的陶瓷基金刚石复合涂层的制备与应用研究

国家自然科学基金

1+阅读 · 2009年12月31日

骨髓MSCs抑制B细胞功能及其治疗MRL/lpr狼疮鼠的机制

国家自然科学基金

0+阅读 · 2009年12月31日

W、Re对单晶高温合金再结晶形核与长大的影响

国家自然科学基金

0+阅读 · 2009年12月31日

环氧树脂改性沥青的微观结构与力学性能

国家自然科学基金

0+阅读 · 2008年12月31日

Feature Interactions Reveal Linguistic Structure in Language Models

Arxiv

0+阅读 · 2023年6月21日

MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models

Arxiv

0+阅读 · 2023年6月20日

KiUT: Knowledge-injected U-Transformer for Radiology Report Generation

Arxiv

0+阅读 · 2023年6月20日

Grounding Classical Task Planners via Vision-Language Models

Arxiv

0+阅读 · 2023年6月19日

Adversarial Robustness of Prompt-based Few-Shot Learning for Natural Language Understanding

Arxiv

0+阅读 · 2023年6月19日

Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning

Arxiv

0+阅读 · 2023年6月19日

RepoFusion: Training Code Models to Understand Your Repository

Arxiv

0+阅读 · 2023年6月19日

Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses

Arxiv

0+阅读 · 2023年6月15日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Semi-supervised Medical Image Segmentation through Dual-task Consistency

Arxiv

14+阅读 · 2020年9月9日

VIP会员

文章信息

相关主题

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

《理解城市战及其在俄乌战争中的表现》报告

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

《建设式兵棋模拟作为战术集群配置优化的关键组成部分》

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

ICLR2019最佳论文出炉

ICLR2019最佳论文出炉

专知

12+阅读 · 2019年5月6日

基于PyTorch/TorchText的自然语言处理库

基于PyTorch/TorchText的自然语言处理库

专知

28+阅读 · 2019年4月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

【推荐】ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

机器学习研究会

20+阅读 · 2017年12月17日

相关论文

Feature Interactions Reveal Linguistic Structure in Language Models

Arxiv

0+阅读 · 2023年6月21日

MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models

Arxiv

0+阅读 · 2023年6月20日

KiUT: Knowledge-injected U-Transformer for Radiology Report Generation

Arxiv

0+阅读 · 2023年6月20日

Grounding Classical Task Planners via Vision-Language Models

Arxiv

0+阅读 · 2023年6月19日

Adversarial Robustness of Prompt-based Few-Shot Learning for Natural Language Understanding

Arxiv

0+阅读 · 2023年6月19日

Cross-Modal Attribute Insertions for Assessing the Robustness of Vision-and-Language Learning

Arxiv

0+阅读 · 2023年6月19日

RepoFusion: Training Code Models to Understand Your Repository

Arxiv

0+阅读 · 2023年6月19日

Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses

Arxiv

0+阅读 · 2023年6月15日

Adaptive Consistency Regularization for Semi-Supervised Transfer Learning

Arxiv

23+阅读 · 2021年3月3日

Semi-supervised Medical Image Segmentation through Dual-task Consistency

Arxiv

14+阅读 · 2020年9月9日

相关基金

基于微裂纹分形特征的超高温陶瓷热冲击性能表征方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

非线性离散可积方程与离散Painlevé方程族的连续极限理论

国家自然科学基金

0+阅读 · 2013年12月31日

发光二极管LED非相干宽带腔增强吸收光谱技术对大气HONO的定量方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

单晶基体界面反应及其对微细无铅焊点可靠性的影响

国家自然科学基金

0+阅读 · 2011年12月31日

胰腺星形细胞对胰腺癌化疗耐药的影响及其机制的研究

国家自然科学基金

0+阅读 · 2011年12月31日

Adiponectin在肝脏缺血再灌注损伤中的抗肝细胞凋亡机制

国家自然科学基金

0+阅读 · 2009年12月31日

基于多层膜结构的陶瓷基金刚石复合涂层的制备与应用研究

国家自然科学基金

1+阅读 · 2009年12月31日

骨髓MSCs抑制B细胞功能及其治疗MRL/lpr狼疮鼠的机制

国家自然科学基金

0+阅读 · 2009年12月31日

W、Re对单晶高温合金再结晶形核与长大的影响

国家自然科学基金

0+阅读 · 2009年12月31日

环氧树脂改性沥青的微观结构与力学性能

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员