视觉语言模型在不平衡学习中的探索 (Exploring Vision-Language Models for Imbalanced Learning) - 专知论文

会员服务 ·

0

不平衡 · 不平衡学习 · 视觉语言模型 · 类别 · 零样本 ·

2023 年 4 月 4 日

Exploring Vision-Language Models for Imbalanced Learning

翻译：视觉语言模型在不平衡学习中的探索

Yidong Wang,Zhuohao Yu,Jindong Wang,Qiang Heng,Hao Chen,Wei Ye,Rui Xie,Xing Xie,Shikun Zhang

from arxiv, Technical report; 14 pages; code: https://github.com/Imbalance-VLM/Imbalance-VLM

Vision-Language models (VLMs) that use contrastive language-image pre-training have shown promising zero-shot classification performance. However, their performance on imbalanced dataset is relatively poor, where the distribution of classes in the training dataset is skewed, leading to poor performance in predicting minority classes. For instance, CLIP achieved only 5% accuracy on the iNaturalist18 dataset. We propose to add a lightweight decoder to VLMs to avoid OOM (out of memory) problem caused by large number of classes and capture nuanced features for tail classes. Then, we explore improvements of VLMs using prompt tuning, fine-tuning, and incorporating imbalanced algorithms such as Focal Loss, Balanced SoftMax and Distribution Alignment. Experiments demonstrate that the performance of VLMs can be further boosted when used with decoder and imbalanced methods. Specifically, our improved VLMs significantly outperforms zero-shot classification by an average accuracy of 6.58%, 69.82%, and 6.17%, on ImageNet-LT, iNaturalist18, and Places-LT, respectively. We further analyze the influence of pre-training data size, backbones, and training cost. Our study highlights the significance of imbalanced learning algorithms in face of VLMs pre-trained by huge data. We release our code at https://github.com/Imbalance-VLM/Imbalance-VLM.

翻译：视觉语言模型(Vision-Language models, VLMs)基于对比语言-图像预训练，在零样本分类任务中表现出极大的性能优势。然而，它们在不平衡数据集上的表现相对较差，即训练集中的类别分布不均，导致对少数类的分类性能差。例如，CLIP在iNaturalist18数据集上仅实现了5%的准确率。因此，我们提出在VLMs中加入轻量级的解码器，以避免由大量类别引起的OOM(内存不足)问题，同时可以捕捉到较为微妙的小类别特征。然后，我们探讨了使用提示调整、微调以及结合不平衡学习方法（如Focal Loss，Balanced SoftMax和Distribution Alignment）进行VLMs改进的方法。实验证明，在加入解码器和不平衡算法的帮助下，我们改进的VLMs在ImageNet-LT、iNaturalist18和Places-LT上平均准确率分别比零样本分类高出6.58％、69.82％和6.17％。我们还进一步分析了预训练数据大小、主干网络和训练成本的影响。本文强调了在使用大规模预训练数据的VLMs中，不平衡学习算法的重要性。我们将我们的代码发布在https://github.com/Imbalance-VLM/Imbalance-VLM。

0

相关内容

不平衡

【NAACL2022】自然语言处理的对比数据与学习

【NAACL2022】自然语言处理的对比数据与学习

专知会员服务

46+阅读 · 2022年7月10日

【南洋理工-CVPR2022】视觉语言模型的条件提示学习

【南洋理工-CVPR2022】视觉语言模型的条件提示学习

专知会员服务

34+阅读 · 2022年3月13日

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

专知会员服务

44+阅读 · 2022年3月8日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【CVPR2021】用Transformers无监督预训练进行目标检测

【CVPR2021】用Transformers无监督预训练进行目标检测

专知会员服务

58+阅读 · 2021年3月3日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

46+阅读 · 2020年7月29日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

【ACL2020-Facebook AI】大规模无监督跨语言表示学习

【ACL2020-Facebook AI】大规模无监督跨语言表示学习

专知会员服务

34+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

ECCV 2022 | 视频理解新框架X-CLIP：仅用微调的成本，达到预训练的全能

ECCV 2022 | 视频理解新框架X-CLIP：仅用微调的成本，达到预训练的全能

PaperWeekly

0+阅读 · 2022年8月9日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

领域自适应学习论文大列表

领域自适应学习论文大列表

专知

71+阅读 · 2019年3月2日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

microRNA介导Vaspin调控动脉钙化的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

回声干扰抑制中的自适应信号处理算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向大数据的安全迁移学习方法

国家自然科学基金

28+阅读 · 2015年12月31日

HMGB1激活血小板NLRP3炎性小体对重症中暑血管内皮损伤的调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

小胶质细胞/巨噬细胞在脑出血脑损伤过程中的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

暖白光LED用低光衰高显色性Lu3Al5-x(Si/B)xO12-yNy:Ce荧光粉的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Src蛋白激酶抑制剂影响氟尿嘧啶诱导的结肠癌细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

动脉粥样硬化进展中TLR4结合并激活Src调控巨噬细胞脂质累积和炎症反应的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

语音识别中的稀疏性深度学习

国家自然科学基金

11+阅读 · 2012年12月31日

非局域性蒸馏

国家自然科学基金

0+阅读 · 2012年12月31日

Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets

Arxiv

0+阅读 · 2023年5月24日

ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers

Arxiv

0+阅读 · 2023年5月24日

Active Learning for Natural Language Generation

Arxiv

0+阅读 · 2023年5月24日

Text encoders are performance bottlenecks in contrastive vision-language models

Arxiv

0+阅读 · 2023年5月24日

Generating Data for Symbolic Language with Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Exploring Train and Test-Time Augmentations for Audio-Language Learning

Arxiv

0+阅读 · 2023年5月23日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

VIP会员

文章信息

相关主题

不平衡学习

视觉语言模型

相关VIP内容

【NAACL2022】自然语言处理的对比数据与学习

【NAACL2022】自然语言处理的对比数据与学习

专知会员服务

46+阅读 · 2022年7月10日

【南洋理工-CVPR2022】视觉语言模型的条件提示学习

【南洋理工-CVPR2022】视觉语言模型的条件提示学习

专知会员服务

34+阅读 · 2022年3月13日

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

专知会员服务

44+阅读 · 2022年3月8日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【CVPR2021】用Transformers无监督预训练进行目标检测

【CVPR2021】用Transformers无监督预训练进行目标检测

专知会员服务

58+阅读 · 2021年3月3日

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

图像分类半监督自监督无监督学习综述，A survey on Semi-, Self- and Unsupervised Learning for Image Classification

专知会员服务

46+阅读 · 2020年7月29日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

【ACL2020-Facebook AI】大规模无监督跨语言表示学习

【ACL2020-Facebook AI】大规模无监督跨语言表示学习

专知会员服务

34+阅读 · 2020年4月5日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

热门VIP内容

开通专知VIP会员享更多权益服务

《战区安全决策课程体系》最新244页

《"无人机航母"原型平台》

任务规划与地形分析：现代复杂环境作战导航体系

《攻击场景描述形式化模型研究》

相关资讯

ECCV 2022 | 视频理解新框架X-CLIP：仅用微调的成本，达到预训练的全能

ECCV 2022 | 视频理解新框架X-CLIP：仅用微调的成本，达到预训练的全能

PaperWeekly

0+阅读 · 2022年8月9日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

领域自适应学习论文大列表

领域自适应学习论文大列表

专知

71+阅读 · 2019年3月2日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Imitation - Reinforcement Learning

Hierarchical Imitation - Reinforcement Learning

CreateAMind

19+阅读 · 2018年5月25日

相关论文

Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets

Arxiv

0+阅读 · 2023年5月24日

ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers

Arxiv

0+阅读 · 2023年5月24日

Active Learning for Natural Language Generation

Arxiv

0+阅读 · 2023年5月24日

Text encoders are performance bottlenecks in contrastive vision-language models

Arxiv

0+阅读 · 2023年5月24日

Generating Data for Symbolic Language with Large Language Models

Arxiv

0+阅读 · 2023年5月23日

Exploring Train and Test-Time Augmentations for Audio-Language Learning

Arxiv

0+阅读 · 2023年5月23日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Masked Autoencoders Are Scalable Vision Learners

Arxiv

27+阅读 · 2021年11月11日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Arxiv

15+阅读 · 2018年10月11日

相关基金

microRNA介导Vaspin调控动脉钙化的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

回声干扰抑制中的自适应信号处理算法研究

国家自然科学基金

1+阅读 · 2015年12月31日

面向大数据的安全迁移学习方法

国家自然科学基金

28+阅读 · 2015年12月31日

HMGB1激活血小板NLRP3炎性小体对重症中暑血管内皮损伤的调控机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

小胶质细胞/巨噬细胞在脑出血脑损伤过程中的作用机制

国家自然科学基金

0+阅读 · 2014年12月31日

暖白光LED用低光衰高显色性Lu3Al5-x(Si/B)xO12-yNy:Ce荧光粉的研究

国家自然科学基金

0+阅读 · 2014年12月31日

Src蛋白激酶抑制剂影响氟尿嘧啶诱导的结肠癌细胞凋亡的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

动脉粥样硬化进展中TLR4结合并激活Src调控巨噬细胞脂质累积和炎症反应的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

语音识别中的稀疏性深度学习

国家自然科学基金

11+阅读 · 2012年12月31日

非局域性蒸馏

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员