Few-shot classification which aims to recognize unseen classes using very limited samples has attracted more and more attention. Usually, it is formulated as a metric learning problem. The core issue of few-shot classification is how to learn (1) consistent representations for images in both support and query sets and (2) effective metric learning for images between support and query sets. In this paper, we show that the two challenges can be well modeled simultaneously via a unified Query-Support TransFormer (QSFormer) model. To be specific,the proposed QSFormer involves global query-support sample Transformer (sampleFormer) branch and local patch Transformer (patchFormer) learning branch. sampleFormer aims to capture the dependence of samples in support and query sets for image representation. It adopts the Encoder, Decoder and Cross-Attention to respectively model the Support, Query (image) representation and Metric learning for few-shot classification task. Also, as a complementary to global learning branch, we adopt a local patch Transformer to extract structural representation for each image sample by capturing the long-range dependence of local image patches. In addition, a novel Cross-scale Interactive Feature Extractor (CIFE) is proposed to extract and fuse multi-scale CNN features as an effective backbone module for the proposed few-shot learning method. All modules are integrated into a unified framework and trained in an end-to-end manner. Extensive experiments on four popular datasets demonstrate the effectiveness and superiority of the proposed QSFormer.
翻译:使用非常有限的样本来识别隐蔽的类别,少发的分类旨在以非常有限的样本来识别隐蔽的类别,这种分类吸引了越来越多的关注。通常,它是一个计量学习问题。少发分类的核心问题是如何学习:(1) 支持和查询组中图像的一致表示,(2) 支持和查询组之间图像的有效衡量学习。在本文件中,我们表明,这两个挑战可以通过统一的查询支持 Transformer(QSformer)模式同时进行。具体地说,拟议的QSFormer(QSFormer)涉及全球查询支持样本变异器(Sample Former)分支和本地补丁变异器(patchFormer)学习组。样本Former旨在捕捉支持和查询组中样本对图像表示的依赖性。它采用 Encoder、Decoder 和交叉感应分别模拟支持、Query(im) 和Metrial学习的简单分类任务。此外,作为全球学习分支,我们采用一个本地补丁变异转换器,通过采集本地图像样本的长距离依赖性对当地图像的甚远距离和跨版模型。一个拟议的缩缩缩缩缩缩缩缩缩的模拟学习模型,这是一个拟议的基础的缩缩缩缩缩缩缩缩缩的缩的缩缩缩缩的缩缩缩缩缩缩的缩缩缩缩的缩缩缩缩缩缩缩的缩的缩的缩缩缩缩缩缩缩缩缩缩缩缩的缩的缩缩缩缩缩图。