通用的稠密预测任务视觉记号匹配少样本学习 (Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching) - 专知论文

会员服务 ·

0

少样本学习 · 样本 · 监督 · 参数匹配 · 分层编码 ·

2023 年 3 月 27 日

Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

翻译：通用的稠密预测任务视觉记号匹配少样本学习

Donggyun Kim,Jinwoo Kim,Seongwoong Cho,Chong Luo,Seunghoon Hong

Dense prediction tasks are a fundamental class of problems in computer vision. As supervised methods suffer from high pixel-wise labeling cost, a few-shot learning solution that can learn any dense task from a few labeled images is desired. Yet, current few-shot learning methods target a restricted set of tasks such as semantic segmentation, presumably due to challenges in designing a general and unified model that is able to flexibly and efficiently adapt to arbitrary tasks of unseen semantics. We propose Visual Token Matching (VTM), a universal few-shot learner for arbitrary dense prediction tasks. It employs non-parametric matching on patch-level embedded tokens of images and labels that encapsulates all tasks. Also, VTM flexibly adapts to any task with a tiny amount of task-specific parameters that modulate the matching algorithm. We implement VTM as a powerful hierarchical encoder-decoder architecture involving ViT backbones where token matching is performed at multiple feature hierarchies. We experiment VTM on a challenging variant of Taskonomy dataset and observe that it robustly few-shot learns various unseen dense prediction tasks. Surprisingly, it is competitive with fully supervised baselines using only 10 labeled examples of novel tasks (0.004% of full supervision) and sometimes outperforms using 0.1% of full supervision. Codes are available at https://github.com/GitGyun/visual_token_matching.

翻译：稠密预测任务是计算机视觉中一个根本性的问题类别。由于监督方法面临高像素标记成本，因此需要一种能够从少量标记图像中学习任何稠密任务的少样本学习解决方案。然而，当前的少样本学习方法针对一组受限的任务，例如语义分割，可能是由于面临设计一个通用和统一的模型的挑战，该模型能够灵活、高效地适应任何未见过的语义任务。本文提出了Visual Token Matching (VTM)，这是一种针对任意密集预测任务的通用少样本学习器。它采用图像和标签的补丁级嵌入标记的非参数匹配，该标记封装了所有任务。此外，VTM通过微调匹配算法的一小部分任务特定参数灵活地适应于任何任务。我们将VTM实现为一种强大的分层编码器-解码器架构，其中ViT骨干网络参与多层级的特征层次结构的标记匹配。我们在Taskonomy数据集的一个具有挑战性的变体上测试了VTM，并观察到它能够稳健地少样本学习各种未见过的稠密预测任务。令人惊讶的是，它仅使用10个新任务的标记样本（全监督的0.004%）即与完全监督基线相竞争，有时甚至优于使用全监督的0.1%。代码可在https://github.com/GitGyun/visual_token_matching上获得了。

0

相关内容

少样本学习

少样本学习

【NeurIPS2022】不用微调的加速大规模视觉Transformer的密集预测

【NeurIPS2022】不用微调的加速大规模视觉Transformer的密集预测

专知会员服务

14+阅读 · 2022年10月5日

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

SiT: 自监督视觉Transformer

专知会员服务

65+阅读 · 2021年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【斯坦福大学-ICLR2020】图神经网络预训练的策略，Strategies for Pre-training Graph Neural Networks

【斯坦福大学-ICLR2020】图神经网络预训练的策略，Strategies for Pre-training Graph Neural Networks

专知会员服务

78+阅读 · 2020年3月1日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

专知会员服务

14+阅读 · 2019年11月11日

论文浅尝 | 弱监督下极简的视觉语言预训练模型

论文浅尝 | 弱监督下极简的视觉语言预训练模型

开放知识图谱

1+阅读 · 2022年9月26日

ECCV 2022 | SSP: 自支持匹配的小样本任务新思想

ECCV 2022 | SSP: 自支持匹配的小样本任务新思想

PaperWeekly

0+阅读 · 2022年7月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

文本+视觉，多篇 Visual/Video BERT 论文介绍

文本+视觉，多篇 Visual/Video BERT 论文介绍

AI科技评论

22+阅读 · 2019年8月30日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

面向可穿戴电子的可拉伸弹性网格储能器件的研究

国家自然科学基金

0+阅读 · 2015年12月31日

融合多尺度稀疏与稠密特征结构的透视不变图像匹配模型研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于意图信息共享的航空器4D航迹预测方法

国家自然科学基金

0+阅读 · 2013年12月31日

基于机器学习的AbS-LPC压缩语音隐写分析研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于TP模型变换方法的视觉伺服控制技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

数据中心Fat-Tree批量调度光包交换新架构

国家自然科学基金

2+阅读 · 2012年12月31日

煤矿井下基于空时处理的编码协作MC-CDMA跨层自适应无线传输技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向综合力学环境预测的回归多任务学习研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于二维随机映射和一范数优化的有监督图像分类研究

国家自然科学基金

3+阅读 · 2011年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

Learning Higher-order Object Interactions for Keypoint-based Video Understanding

Arxiv

0+阅读 · 2023年5月16日

Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition

Arxiv

0+阅读 · 2023年5月16日

Versatile Audio-Visual Learning for Handling Single and Multi Modalities in Emotion Regression and Classification Tasks

Arxiv

0+阅读 · 2023年5月12日

Hybrid Curriculum Learning for Emotion Recognition in Conversation

Arxiv

14+阅读 · 2021年12月22日

GAN-Supervised Dense Visual Alignment

Arxiv

10+阅读 · 2021年12月9日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Matching Networks for One Shot Learning

Arxiv

10+阅读 · 2017年12月29日

VIP会员

文章信息

相关主题

少样本学习

相关VIP内容

【NeurIPS2022】不用微调的加速大规模视觉Transformer的密集预测

【NeurIPS2022】不用微调的加速大规模视觉Transformer的密集预测

专知会员服务

14+阅读 · 2022年10月5日

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

【CVPR 2022】视觉提示调整（VPT），Vision Prompt Tuning

专知会员服务

32+阅读 · 2022年3月12日

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

SiT: 自监督视觉Transformer

专知会员服务

65+阅读 · 2021年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【斯坦福大学-ICLR2020】图神经网络预训练的策略，Strategies for Pre-training Graph Neural Networks

【斯坦福大学-ICLR2020】图神经网络预训练的策略，Strategies for Pre-training Graph Neural Networks

专知会员服务

78+阅读 · 2020年3月1日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

【AAAI2020接受论文】多任务自监督学习的不流利检测，Multi-Task Self-Supervised Learning for Disfluency Detection

专知会员服务

14+阅读 · 2019年11月11日

热门VIP内容

开通专知VIP会员享更多权益服务

《基于大型语言模型的软件工程自动化研究》最新264页

《基于大型语言模型的信号处理管线研究：推进军事电子情报工作流程》最新76页

中文版 | 战争算法：生成式人工智能在战场的崛起

中文版《美国陆军：战术行为性远程医疗实施观察与建议》

相关资讯

论文浅尝 | 弱监督下极简的视觉语言预训练模型

论文浅尝 | 弱监督下极简的视觉语言预训练模型

开放知识图谱

1+阅读 · 2022年9月26日

ECCV 2022 | SSP: 自支持匹配的小样本任务新思想

ECCV 2022 | SSP: 自支持匹配的小样本任务新思想

PaperWeekly

0+阅读 · 2022年7月28日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知

133+阅读 · 2020年3月18日

RoBERTa中文预训练模型：RoBERTa for Chinese

RoBERTa中文预训练模型：RoBERTa for Chinese

PaperWeekly

57+阅读 · 2019年9月16日

文本+视觉，多篇 Visual/Video BERT 论文介绍

文本+视觉，多篇 Visual/Video BERT 论文介绍

AI科技评论

22+阅读 · 2019年8月30日

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

BERT/注意力机制/Transformer/迁移学习NLP资源大列表：awesome-bert-nlp

AINLP

40+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

【论文推荐】最新七篇自注意力机制(Self-attention)相关论文—结构化自注意力、相对位置、混合、句子表达、文本向量

专知

29+阅读 · 2018年3月12日

相关论文

Learning Higher-order Object Interactions for Keypoint-based Video Understanding

Arxiv

0+阅读 · 2023年5月16日

Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition

Arxiv

0+阅读 · 2023年5月16日

Versatile Audio-Visual Learning for Handling Single and Multi Modalities in Emotion Regression and Classification Tasks

Arxiv

0+阅读 · 2023年5月12日

Hybrid Curriculum Learning for Emotion Recognition in Conversation

Arxiv

14+阅读 · 2021年12月22日

GAN-Supervised Dense Visual Alignment

Arxiv

10+阅读 · 2021年12月9日

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Arxiv

13+阅读 · 2021年4月7日

Dense Contrastive Learning for Self-Supervised Visual Pre-Training

Arxiv

18+阅读 · 2021年4月4日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Matching Networks for One Shot Learning

Arxiv

10+阅读 · 2017年12月29日

相关基金

面向可穿戴电子的可拉伸弹性网格储能器件的研究

国家自然科学基金

0+阅读 · 2015年12月31日

融合多尺度稀疏与稠密特征结构的透视不变图像匹配模型研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于意图信息共享的航空器4D航迹预测方法

国家自然科学基金

0+阅读 · 2013年12月31日

基于机器学习的AbS-LPC压缩语音隐写分析研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于TP模型变换方法的视觉伺服控制技术研究

国家自然科学基金

2+阅读 · 2013年12月31日

数据中心Fat-Tree批量调度光包交换新架构

国家自然科学基金

2+阅读 · 2012年12月31日

煤矿井下基于空时处理的编码协作MC-CDMA跨层自适应无线传输技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

面向综合力学环境预测的回归多任务学习研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于二维随机映射和一范数优化的有监督图像分类研究

国家自然科学基金

3+阅读 · 2011年12月31日

基于Sparse-Land模型的SAR图像噪声抑制与分割

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员