Due to the fact that fully supervised semantic segmentation methods require sufficient fully-labeled data to work well and can not generalize to unseen classes, few-shot segmentation has attracted lots of research attention. Previous arts extract features from support and query images, which are processed jointly before making predictions on query images. The whole process is based on convolutional neural networks (CNN), leading to the problem that only local information is used. In this paper, we propose a TRansformer-based Few-shot Semantic segmentation method (TRFS). Specifically, our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM). GEM adopts transformer blocks to exploit global information, while LEM utilizes conventional convolutions to exploit local information, across query and support features. Both GEM and LEM are complementary, helping to learn better feature representations for segmenting query images. Extensive experiments on PASCAL-5i and COCO datasets show that our approach achieves new state-of-the-art performance, demonstrating its effectiveness.
翻译:由于充分监督的语义分解方法需要足够的全标签数据才能很好地发挥作用,而且不能将数据推广到看不见的类别,少数截肢已经引起了许多研究关注。从支持和查询图像中提取的以往艺术特征,在对查询图像作出预测之前是共同处理的。整个过程都基于进化神经网络(CNN),导致只使用当地信息的问题。在本文中,我们建议采用基于TRansformex的少发分解方法(TRFS)。具体地说,我们的模型由两个模块组成:全球增强模块(GEM)和地方增强模块(LEM)。GEM采用变压器块来利用全球信息,而LEM则利用传统的组合来利用本地信息,跨越查询和支持功能。GEM和LEM都是互补的,有助于为分解查询图像学习更好的特征描述。关于PASAL-5i和COCO数据集的广泛实验表明,我们的方法取得了新的状态性能,显示了其有效性。