We present SegGPT, a generalist model for segmenting everything in context. We unify various segmentation tasks into a generalist in-context learning framework that accommodates different kinds of segmentation data by transforming them into the same format of images. The training of SegGPT is formulated as an in-context coloring problem with random color mapping for each data sample. The objective is to accomplish diverse tasks according to the context, rather than relying on specific colors. After training, SegGPT can perform arbitrary segmentation tasks in images or videos via in-context inference, such as object instance, stuff, part, contour, and text. SegGPT is evaluated on a broad range of tasks, including few-shot semantic segmentation, video object segmentation, semantic segmentation, and panoptic segmentation. Our results show strong capabilities in segmenting in-domain and out-of-domain targets, either qualitatively or quantitatively.
翻译:我们提出了SegGPT,一种用于上下文中全面分割的通用模型。我们将各种分割任务统一成一个一般的上下文学习框架,通过将它们转换为图像的相同格式,以适应不同类型的分割数据。 SegGPT的训练被定义为一个在上下文着色问题下的训练过程,对于每个数据样本都有随机颜色映射。目标是根据上下文实现多样化的任务,而不是依赖特定的颜色。训练后,SegGPT可以通过上下文推理在图像或视频中执行任意分割任务,如物体实例、物品、部分、轮廓和文本。SegGPT在各种任务上进行了评估,包括少样本语义分割、视频目标分割、语义分割和全景分割。我们的结果表明,在域内和域外目标的分割能力方面,无论是定性还是定量,SegGPT都具有很强的能力。