重新审视使用视觉Transformer进行长尾识别问题 (Rethink Long-tailed Recognition with Vision Transformers) - 专知论文

会员服务 ·

0

视觉Transformer · 识别 · 无监督 · Transformer · 单分类 ·

2023 年 4 月 17 日

Rethink Long-tailed Recognition with Vision Transformers

翻译：重新审视使用视觉Transformer进行长尾识别问题

Zhengzhuo Xu,Shuo Yang,Xingjun Wang,Chun Yuan

from arxiv, Accepted by ICASSP 2023

In the real world, data tends to follow long-tailed distributions w.r.t. class or attribution, motivating the challenging Long-Tailed Recognition (LTR) problem. In this paper, we revisit recent LTR methods with promising Vision Transformers (ViT). We figure out that 1) ViT is hard to train with long-tailed data. 2) ViT learns generalized features in an unsupervised manner, like mask generative training, either on long-tailed or balanced datasets. Hence, we propose to adopt unsupervised learning to utilize long-tailed data. Furthermore, we propose the Predictive Distribution Calibration (PDC) as a novel metric for LTR, where the model tends to simply classify inputs into common classes. Our PDC can measure the model calibration of predictive preferences quantitatively. On this basis, we find many LTR approaches alleviate it slightly, despite the accuracy improvement. Extensive experiments on benchmark datasets validate that PDC reflects the model's predictive preference precisely, which is consistent with the visualization.

翻译：在现实世界中，数据往往遵循以类或属性为基础的长尾分布，这也激发了令人挑战的长尾识别（LTR）问题。在本文中，我们使用全新的视觉Transformer（ViT）重新审视了最近的LTR方法。我们发现， 1）ViT很难用于长尾数据训练。2）ViT以一种无监督的方式学习广义特征，例如面具生成训练，不论是在长尾还是平衡的数据集上。因此，我们建议采用无监督学习来利用长尾数据。此外，我们提出了预测分布校准（PDC）作为LTR的新指标，其中模型趋向于将输入简单分类为常见类。我们的PDC可以定量地测量预测偏好的模型校准。在此基础上，我们发现许多LTR方法虽然实现了准确性的提高，但它们略微缓解了PDC。在基准数据集上进行的广泛实验验证了PDC可以精确地反映模型的预测偏好，这与可视化结果一致。

0

相关内容

视觉Transformer

视觉Transformer

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

专知会员服务

13+阅读 · 2022年3月19日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR2021】用Transformers无监督预训练进行目标检测

【CVPR2021】用Transformers无监督预训练进行目标检测

专知会员服务

58+阅读 · 2021年3月3日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

专知会员服务

39+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【Google AI】开源NoisyStudent：自监督图像分类

【Google AI】开源NoisyStudent：自监督图像分类

专知会员服务

55+阅读 · 2020年2月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

使用PyTorch进行小样本学习的图像分类

使用PyTorch进行小样本学习的图像分类

极市平台

1+阅读 · 2022年11月4日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

凋亡诱导因子AIF调控Wnt信号通路的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

四步法三维编织复合材料弯曲疲劳失效多尺度损伤模型

国家自然科学基金

0+阅读 · 2015年12月31日

血浆D-dimer检测恶性肿瘤血行微转移的临床价值评估及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

超灵敏129Xe磁共振分子影像学新方法---化学交换反转转移

国家自然科学基金

0+阅读 · 2014年12月31日

循环肿瘤细胞实时分子分型指导乳腺癌HER2靶向药物合理使用的探索性研究

国家自然科学基金

0+阅读 · 2014年12月31日

C/C复合材料在热力环境下的阻尼行为与损伤表征

国家自然科学基金

1+阅读 · 2013年12月31日

等离子喷涂层的磨损/疲劳竞争性寿命演变规律研究

国家自然科学基金

0+阅读 · 2012年12月31日

内嵌富勒烯的瞬态高压合成及其机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

视觉识别中类别信息早期加工的认知神经机制

国家自然科学基金

0+阅读 · 2009年12月31日

基于稀疏图表征理论的姿态/遮挡耦合变形车辆SAR图像识别

国家自然科学基金

0+阅读 · 2009年12月31日

Continual Vision-Language Representation Learning with Off-Diagonal Information

Arxiv

0+阅读 · 2023年6月1日

Unsupervised Anomaly Detection in Medical Images Using Masked Diffusion Model

Arxiv

0+阅读 · 2023年5月31日

A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition

Arxiv

0+阅读 · 2023年5月30日

Long-Term Rhythmic Video Soundtracker

Arxiv

0+阅读 · 2023年5月30日

An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling

Arxiv

0+阅读 · 2023年5月30日

DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes

Arxiv

0+阅读 · 2023年5月29日

Multimodal Prompting with Missing Modalities for Visual Recognition

Arxiv

11+阅读 · 2023年3月6日

Deep Long-Tailed Learning: A Survey

Arxiv

13+阅读 · 2021年10月9日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Pose-Normalized Image Generation for Person Re-identification

Arxiv

11+阅读 · 2018年1月18日

VIP会员

文章信息

相关主题

视觉Transformer

相关VIP内容

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

【CVPR 2022】长尾视觉数据识别的嵌套式协同学习方法 Nested Collaborative Learning for Long-Tailed Visual Recognition

专知会员服务

13+阅读 · 2022年3月19日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR2021】用Transformers无监督预训练进行目标检测

【CVPR2021】用Transformers无监督预训练进行目标检测

专知会员服务

58+阅读 · 2021年3月3日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

专知会员服务

39+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【Google AI】开源NoisyStudent：自监督图像分类

【Google AI】开源NoisyStudent：自监督图像分类

专知会员服务

55+阅读 · 2020年2月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

使用PyTorch进行小样本学习的图像分类

使用PyTorch进行小样本学习的图像分类

极市平台

1+阅读 · 2022年11月4日

浅聊对比学习（Contrastive Learning）

浅聊对比学习（Contrastive Learning）

极市平台

2+阅读 · 2022年7月26日

浅聊对比学习（Contrastive Learning）第一弹

浅聊对比学习（Contrastive Learning）第一弹

PaperWeekly

0+阅读 · 2022年6月10日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Continual Vision-Language Representation Learning with Off-Diagonal Information

Arxiv

0+阅读 · 2023年6月1日

Unsupervised Anomaly Detection in Medical Images Using Masked Diffusion Model

Arxiv

0+阅读 · 2023年5月31日

A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition

Arxiv

0+阅读 · 2023年5月30日

Long-Term Rhythmic Video Soundtracker

Arxiv

0+阅读 · 2023年5月30日

An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling

Arxiv

0+阅读 · 2023年5月30日

DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes

Arxiv

0+阅读 · 2023年5月29日

Multimodal Prompting with Missing Modalities for Visual Recognition

Arxiv

11+阅读 · 2023年3月6日

Deep Long-Tailed Learning: A Survey

Arxiv

13+阅读 · 2021年10月9日

SiT: Self-supervised vIsion Transformer

Arxiv

19+阅读 · 2021年4月8日

Pose-Normalized Image Generation for Person Re-identification

Arxiv

11+阅读 · 2018年1月18日

相关基金

凋亡诱导因子AIF调控Wnt信号通路的机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

四步法三维编织复合材料弯曲疲劳失效多尺度损伤模型

国家自然科学基金

0+阅读 · 2015年12月31日

血浆D-dimer检测恶性肿瘤血行微转移的临床价值评估及机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

超灵敏129Xe磁共振分子影像学新方法---化学交换反转转移

国家自然科学基金

0+阅读 · 2014年12月31日

循环肿瘤细胞实时分子分型指导乳腺癌HER2靶向药物合理使用的探索性研究

国家自然科学基金

0+阅读 · 2014年12月31日

C/C复合材料在热力环境下的阻尼行为与损伤表征

国家自然科学基金

1+阅读 · 2013年12月31日

等离子喷涂层的磨损/疲劳竞争性寿命演变规律研究

国家自然科学基金

0+阅读 · 2012年12月31日

内嵌富勒烯的瞬态高压合成及其机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

视觉识别中类别信息早期加工的认知神经机制

国家自然科学基金

0+阅读 · 2009年12月31日

基于稀疏图表征理论的姿态/遮挡耦合变形车辆SAR图像识别

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员