Video semantic segmentation (VSS) is beneficial for dealing with dynamic scenes due to the continuous property of the real-world environment. On the one hand, some methods alleviate the predicted inconsistent problem between continuous frames. On the other hand, other methods employ the previous frame as the prior information to assist in segmenting the current frame. Although the previous methods achieve superior performances on the independent and identically distributed (i.i.d) data, they can not generalize well on other unseen domains. Thus, we explore a new task, the video generalizable semantic segmentation (VGSS) task that considers both continuous frames and domain generalization. In this paper, we propose a class-wise non-salient region generalized (CNSG) framework for the VGSS task. Concretely, we first define the class-wise non-salient feature, which describes features of the class-wise non-salient region that carry more generalizable information. Then, we propose a class-wise non-salient feature reasoning strategy to select and enhance the most generalized channels adaptively. Finally, we propose an inter-frame non-salient centroid alignment loss to alleviate the predicted inconsistent problem in the VGSS task. We also extend our video-based framework to the image-based generalizable semantic segmentation (IGSS) task. Experiments demonstrate that our CNSG framework yields significant improvement in the VGSS and IGSS tasks.
翻译:由于真实世界环境的特性持续不断,视频语义分解(VSS)对于处理动态场景是有益的。一方面,一些方法缓解了连续框架之间预测的不一致问题。另一方面,其他方法使用以前的框架作为先前的信息,以协助分解当前框架。虽然以前的方法在独立和同样分布的(即d)数据上取得了优异的性能,但在其他看不见领域却不能非常概括。因此,我们探索了一个新的任务,即可视频通用语义分解(VGSS)任务,该任务既考虑连续框架,又考虑域域的概括化。在本文中,我们建议为VGSS任务建立一个等级上的非高度性通用区域(CNSG)框架。具体地说,我们首先确定等级上的非高度性特征,该特征描述类别上的非高度性非高度区域的特点,而该区域则包含更加广泛的信息。然后,我们提出一种基于等级的非高度性特征推理战略,以选择和加强最通用的渠道。最后,我们提议在 VGS 类中,我们提出了一个跨结构的不透明区域通用的GIS 总体图像分级调整任务。我们预测的图像分级平段任务。