In visual recognition tasks, few-shot learning requires the ability to learn object categories with few support examples. Its re-popularity in light of the deep learning development is mainly in image classification. This work focuses on few-shot semantic segmentation, which is still a largely unexplored field. A few recent advances are often restricted to single-class few-shot segmentation. In this paper, we first present a novel multi-way (class) encoding and decoding architecture which effectively fuses multi-scale query information and multi-class support information into one query-support embedding. Multi-class segmentation is directly decoded upon this embedding. For better feature fusion, a multi-level attention mechanism is proposed within the architecture, which includes the attention for support feature modulation and attention for multi-scale combination. Last, to enhance the embedding space learning, an additional pixel-wise metric learning module is introduced with triplet loss formulated on the pixel-level embedding of the input image. Extensive experiments on standard benchmarks PASCAL-5i and COCO-20i show clear benefits of our method over the state of the art in few-shot segmentation
翻译:在视觉识别任务中,微小的学习要求具备学习对象类别的能力,并没有什么支持实例。根据深层次学习发展,其再普及性主要是在图像分类中。 这项工作侧重于少数片段语义分解, 仍是一个基本上尚未探索的字段。 最近的一些进展往往局限于单级片段分解。 在本文中, 我们首先提出一个新的多路( 类)编码和解码结构, 将多级查询信息和多级支持信息有效地结合到一个查询支持嵌入中。 多级分解直接在嵌入中解码。 为了更好地进行特征融合, 提议在结构内建立一个多级关注机制, 其中包括关注支持特性调制和多级组合。 最后, 为了加强嵌入空间学习, 我们引入了另一个像素学的模型, 在像素层嵌入图像上三重损失。 关于标准基准 PASAL-5i 和 COCO-20i 的大规模实验显示我们的方法在小片段艺术状态上明显的好处。