VL-LTR: 长期学习视觉识别的学习类级视觉语言代表 (VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition) - 专知论文

会员服务 ·

0

学成 · Performer · Extensibility · 模态 · state-of-the-art ·

2021 年 11 月 26 日

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

翻译：VL-LTR: 长期学习视觉识别的学习类级视觉语言代表

Changyao Tian,Wenhai Wang,Xizhou Zhu,Xiaogang Wang,Jifeng Dai,Yu Qiao

from arxiv, Technical report; 14 pages, 9 figures;

Deep learning-based models encounter challenges when processing long-tailed data in the real world. Existing solutions usually employ some balancing strategies or transfer learning to deal with the class imbalance problem, based on the image modality. In this work, we present a visual-linguistic long-tailed recognition framework, termed VL-LTR, and conduct empirical studies on the benefits of introducing text modality for long-tailed recognition (LTR). Compared to existing approaches, the proposed VL-LTR has the following merits. (1) Our method can not only learn visual representation from images but also learn corresponding linguistic representation from noisy class-level text descriptions collected from the Internet; (2) Our method can effectively use the learned visual-linguistic representation to improve the visual recognition performance, especially for classes with fewer image samples. We also conduct extensive experiments and set the new state-of-the-art performance on widely-used LTR benchmarks. Notably, our method achieves 77.2% overall accuracy on ImageNet-LT, which significantly outperforms the previous best method by over 17 points, and is close to the prevailing performance training on the full ImageNet. Code shall be released.

翻译：深层次的学习模型在现实世界中处理长尾数据时遇到挑战。现有解决方案通常采用一些平衡战略或转让学习方法,根据图像模式处理阶级不平衡问题。在这项工作中,我们提出了一个视觉语言长期老化识别框架(称为VL-LTR),并就引入长尾识别文本模式的好处进行经验研究。与现有方法相比,拟议的VL-LTR具有以下优点:(1) 我们的方法不仅可以从图像中学习视觉表现,还可以从从从从互联网收集的吵闹的阶级层次文字描述中学习相应的语言表现;(2) 我们的方法可以有效利用所学的视觉语言表现来改进视觉识别绩效,特别是图像样本较少的班级。我们还进行了广泛的实验,并将新的最新表现设置在广泛使用的LTR基准上。值得注意的是,我们的方法在图像网-LT上实现了77.2%的总体准确度,大大超过17个点,接近全图像网的当前业绩培训。

0

相关内容

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

专知会员服务

12+阅读 · 2020年1月7日

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

专知会员服务

26+阅读 · 2019年12月25日

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

专知会员服务

30+阅读 · 2019年12月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

专知

6+阅读 · 2017年10月14日

Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Arxiv

0+阅读 · 2022年1月30日

SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training

Arxiv

0+阅读 · 2022年1月29日

Deep Long-Tailed Learning: A Survey

Arxiv

13+阅读 · 2021年10月9日

CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations

Arxiv

10+阅读 · 2021年9月30日

MST: Masked Self-Supervised Transformer for Visual Representation

Arxiv

4+阅读 · 2021年6月10日

MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition

Arxiv

7+阅读 · 2021年3月25日

Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition

Arxiv

8+阅读 · 2020年12月4日

Equalization Loss for Long-Tailed Object Recognition

Equalization Loss for Long-Tailed Object Recognition

Arxiv

5+阅读 · 2020年4月14日

Self-labelling via simultaneous clustering and representation learning

Self-labelling via simultaneous clustering and representation learning

Arxiv

3+阅读 · 2019年11月13日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

神经常微分方程教程，50页ppt，A brief tutorial on Neural ODEs

专知会员服务

74+阅读 · 2020年8月2日

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

【ICML2020-伯克利-马毅老师组】深度等距学习的视觉识别，Deep Isometric Learning for Visual Recognition

专知会员服务

25+阅读 · 2020年7月1日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

【Google】大迁移：通用视觉表示学习，General Visual Representation Learning

专知会员服务

37+阅读 · 2020年5月9日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

253+阅读 · 2020年4月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

【Google无监督大规模视觉表示迁移】Large Scale Learning of General Visual Representations for Transfer

专知会员服务

12+阅读 · 2020年1月7日

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

【互信息与自监督学习，32页ppt】'Notes and tutorials on "Mutual information and self-supervised learning‘“

专知会员服务

26+阅读 · 2019年12月25日

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

【NeurIPS2019】模仿学习中的因果混乱问题 Causal Confusion in Imitation Learning

专知会员服务

30+阅读 · 2019年12月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《城市滨海地区：理解复杂多变环境下的指挥控制框架》50页报告

《理解城市战及其在俄乌战争中的表现》报告

美空军“顶点2025”实验：推进AI在C2、动态目标锁定与联盟集成中的应用

《建设式兵棋模拟作为战术集群配置优化的关键组成部分》

相关资讯

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

专知

6+阅读 · 2017年10月14日

相关论文

Visual Representation Learning with Self-Supervised Attention for Low-Label High-data Regime

Arxiv

0+阅读 · 2022年1月30日

SPIRAL: Self-supervised Perturbation-Invariant Representation Learning for Speech Pre-Training

Arxiv

0+阅读 · 2022年1月29日

Deep Long-Tailed Learning: A Survey

Arxiv

13+阅读 · 2021年10月9日

CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations

Arxiv

10+阅读 · 2021年9月30日

MST: Masked Self-Supervised Transformer for Visual Representation

Arxiv

4+阅读 · 2021年6月10日

MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition

Arxiv

7+阅读 · 2021年3月25日

Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition

Arxiv

8+阅读 · 2020年12月4日

Equalization Loss for Long-Tailed Object Recognition

Equalization Loss for Long-Tailed Object Recognition

Arxiv

5+阅读 · 2020年4月14日

Self-labelling via simultaneous clustering and representation learning

Self-labelling via simultaneous clustering and representation learning

Arxiv

3+阅读 · 2019年11月13日

Learning to Count Objects in Natural Images for Visual Question Answering

Arxiv

12+阅读 · 2018年2月15日

微信扫码咨询专知VIP会员