Few-shot Semantic Segmentation (FSS) was proposed to segment unseen classes in a query image, referring to only a few annotated examples named support images. One of the characteristics of FSS is spatial inconsistency between query and support targets, e.g., texture or appearance. This greatly challenges the generalization ability of methods for FSS, which requires to effectively exploit the dependency of the query image and the support examples. Most existing methods abstracted support features into prototype vectors and implemented the interaction with query features using cosine similarity or feature concatenation. However, this simple interaction may not capture spatial details in query features. To alleviate this limitation, a few methods utilized all pixel-wise support information via computing the pixel-wise correlations between paired query and support features implemented with the attention mechanism of Transformer. These approaches suffer from heavy computation on the dot-product attention between all pixels of support and query features. In this paper, we propose a simple yet effective framework built upon Transformer termed as ProtoFormer to fully capture spatial details in query features. It views the abstracted prototype of the target class in support features as Query and the query features as Key and Value embeddings, which are input to the Transformer decoder. In this way, the spatial details can be better captured and the semantic features of target class in the query image can be focused. The output of the Transformer-based module can be viewed as semantic-aware dynamic kernels to filter out the segmentation mask from the enriched query features. Extensive experiments on PASCAL-$5^{i}$ and COCO-$20^{i}$ show that our ProtoFormer significantly advances the state-of-the-art methods.
翻译:微小的语义分解( FSS) 提议在查询图像中将未知的类别分割为查询图像, 仅提及几个附加说明的例子 支持图像 。 FSS 的特性之一是查询和支持目标之间的空间不一致, 例如质质或外观。 这极大地挑战了 FSS 方法的概括性能力, 它需要有效地利用查询图像和辅助示例的依赖性。 多数现有方法将支持特性抽象化成原型矢量, 并采用使用相近的 codal 或特性相交的查询特性进行查询。 但是, 这种简单的互动可能无法在查询特性中捕获空间细节 。 为了减轻这一限制, FSS 的特性有少数方法利用了所有的像素支持目标与支持目标对象之间的空间支持信息, 例如在变异的查询与支持特性之间, 这些方法是对所有支持的像素和查询特性之间的重度的计算。 在变异性价制中, 我们的变压值的值值 值 值 值 和变压的变压的变压 方向, 将显示为变压的变压的变压的变式 方向 。 变压的变式 和变压的变压的变压的变压的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变的变的变式的变式的变式的变式的变式的变式的变式的变式的变式的变式的变制的变式的变式的变式的变式的变式的变制的变制的变式的变制的变式的变式的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变制的变的变的变的变制的变制的变制的变制的变制的变制的变制的变制的变制的变的变的变制的变制的变