A few-shot semantic segmentation model is typically composed of a CNN encoder, a CNN decoder and a simple classifier (separating foreground and background pixels). Most existing methods meta-learn all three model components for fast adaptation to a new class. However, given that as few as a single support set image is available, effective model adaption of all three components to the new class is extremely challenging. In this work we propose to simplify the meta-learning task by focusing solely on the simplest component, the classifier, whilst leaving the encoder and decoder to pre-training. We hypothesize that if we pre-train an off-the-shelf segmentation model over a set of diverse training classes with sufficient annotations, the encoder and decoder can capture rich discriminative features applicable for any unseen classes, rendering the subsequent meta-learning stage unnecessary. For the classifier meta-learning, we introduce a Classifier Weight Transformer (CWT) designed to dynamically adapt the supportset trained classifier's weights to each query image in an inductive way. Extensive experiments on two standard benchmarks show that despite its simplicity, our method outperforms the state-of-the-art alternatives, often by a large margin.Code is available on https://github.com/zhiheLu/CWTfor-FSS.
翻译:微小的语义分解模型通常由CNN 编码器、 CNN 解码器和简单分类器组成( 区分前地和背景像素) 。 多数现有方法元精液所有三个模型组件, 以快速适应新类。 但是, 鉴于仅有的单个支持数据集图像, 对所有三个组成部分进行有效的模型调整以适应新类非常困难。 在这项工作中, 我们建议简化元学习任务, 仅侧重于最简单的组件, 即 分类器, 将编码器和解码器留给培训前。 我们假设, 如果我们在一套具有足够说明的多样化培训课程上预先培养一个现成的分解模型, 编码器和解码器可以捕捉到适用于任何看不见课程的丰富的歧视特征, 使得随后的元学习阶段没有必要。 对于分类器的元学习, 我们引入了一个分类器 Weight变换器( CWTT), 旨在动态地将经过训练的分类器重量调整到每个查询图像上。 我们假设的是, 如果我们的分类器分解式分解模型, 通常以两种标准法的S- grealal- laftalal ex ex ex