It's a meaningful and attractive topic to build a general and inclusive segmentation model that can recognize more categories in various scenarios. A straightforward way is to combine the existing fragmented segmentation datasets and train a multi-dataset network. However, there are two major issues with multi-dataset segmentation: (1) the inconsistent taxonomy demands manual reconciliation to construct a unified taxonomy; (2) the inflexible one-hot common taxonomy causes time-consuming model retraining and defective supervision of unlabeled categories. In this paper, we investigate the multi-dataset segmentation and propose a scalable Language-guided Multi-dataset Segmentation framework, dubbed LMSeg, which supports both semantic and panoptic segmentation. Specifically, we introduce a pre-trained text encoder to map the category names to a text embedding space as a unified taxonomy, instead of using inflexible one-hot label. The model dynamically aligns the segment queries with the category embeddings. Instead of relabeling each dataset with the unified taxonomy, a category-guided decoding module is designed to dynamically guide predictions to each datasets taxonomy. Furthermore, we adopt a dataset-aware augmentation strategy that assigns each dataset a specific image augmentation pipeline, which can suit the properties of images from different datasets. Extensive experiments demonstrate that our method achieves significant improvements on four semantic and three panoptic segmentation datasets, and the ablation study evaluates the effectiveness of each component.
翻译:这是一个有意义和有吸引力的主题, 用来构建一个普通和包容性的分类模型, 可以在各种假设中识别更多的类别。 一个直截了当的方式是将现有的零散分割数据集组合起来, 并训练一个多数据区块网络。 但是, 多数据区块分割有两个主要问题:(1) 不一致的分类学要求手工调节, 以构建统一的分类学; (2) 不灵活的单热共分类法导致不易变动的模式再培训, 并对未加标签的类别进行有缺陷的监督。 在本文中, 我们调查多数据区块分割, 并提议一个可缩放的语言导多数据区块分割框架, 称为LMSeg, 支持语系分割和全观分割。 然而, 具体地说, 我们引入了预先训练的文本编码编码器, 将分类名称映射成一个将空间嵌入一个统一的分类区块, 而不是使用不易变动的单色标签。 模型将区段关系查询与嵌入的类别相匹配。 而不是将每个数据集重新标与统一的分类分析、 分类导导算的多路段段段段段结构, 和我们每个数据递增的公式的模型将每个数据系统都进行一个动态数据分析。</s>